Databricks takes the power of Spark and Manages it in a Simple to use Platform

Working with Big Data is a challenge, and managing data pipelines or data science workflows further complicates things. In early 2009, a solution came out of UC Berkeley in the form of Spark. Spark is a tool which utilizes memory--rather than hard disk and CPU--in a way which allows users to transform Big Data and utilize machine learning models faster and more efficiently than ever before.

While Spark shaped the way Big Data is done, it wasn’t an end all be all tool. Spark on its own takes significant amounts of time and resources to setup and manage. This cost of use makes Spark less appealing and more specialized than one would hope, limiting its overall effectiveness as a data science, data engineering, and/or data analytics tool.

Managed Spark with Databricks

At DAS42, we have a principle we work by and hold dear: if a cloud-managed service for your tool exists, use it. Spark is no exception. We’ve experienced the complexity of managing multiple jobs that run on different cadences, across multiple EMR clusters. Not only is it compute and man-hour intensive, but it requires certain levels of expertise and specialization that are hard to find and expensive.

Databricks solves this problem by abstracting cluster management. Databricks takes the DevOps out of Spark, instead creating a ready to use Unified Data Platform for data engineers, data scientists, and data analysts alike. Whether you use Spark in Scala, Python, or SQL, Databricks Workspace provides an easy to access platform across your data teams.

Built for the Cloud

Along with ease of use, Databricks is both highly efficient and scalable, allowing one to quickly and simply spin up a cluster, run a job, and turn off said cluster. With built in auto scaling and automatic cluster termination, Databricks does an effective job of simplifying and abstracting traditional spark cluster management.

While cluster management is a pain with Spark, log and error management are worse. Databricks surfaces logs and errors further simplifying this part of the Spark process.

Databricks is built in the cloud with support for either AWS or Azure. This means that Databricks gracefully melds with your existing cloud platform, offering you all the scalability, security, and flexibility you’ve come to know with cloud platforms.

Databricks’s Commitment to Spark

It doesn’t take long to understand Databricks’s commitment to Spark: much of their founding team overlaps with Spark’s. Beyond this expertise, Databricks makes frequent commits to the Open Source Spark Repository and manages the Spark Certification Exam.

And now DAS42 is Committed to Databricks

We’re excited to announce our partnership with Databricks.  Whether your needs are in ETL or data science, DAS42 can help your organization come up with a plan and implement Databricks to meet your needs. If you'd like to check out Databricks, signup for their trial.