Re: Databricks Cloud vs AWS EMR

Eran Witkon Thu, 28 Jan 2016 22:36:29 -0800

Can you name the features that make databricks better than zepplin?
Eran
On Fri, 29 Jan 2016 at 01:37 Michal Klos <[email protected]> wrote:


> We use both databricks and emr. We use databricks for our exploratory /
> adhoc use cases because their notebook is pretty badass and better than
> Zeppelin IMHO.
>
> We use EMR for our production machine learning and ETL tasks. The nice
> thing about EMR is you can use applications other than spark. From a "tools
> in the toolbox" perspective this is very important.
>
> M
>
> On Jan 28, 2016, at 6:05 PM, Sourav Mazumder <[email protected]>
> wrote:
>
> You can also try out IBM's spark as a service in IBM Bluemix. You'll get
> there all required features for security, multitenancy, notebook,
> integration with other big data services. You can try that out for free too.
>
> Regards,
> Sourav
>
> On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni <[email protected]>
> wrote:
>
>> At its core, EMR just launches Spark applications, whereas Databricks is
>>> a higher-level platform that also includes multi-user support, an
>>> interactive UI, security, and job scheduling.
>>>
>>> Specifically, Databricks runs standard Spark applications inside a
>>> user’s AWS account, similar to EMR, but it adds a variety of features to
>>> create an end-to-end environment for working with Spark. These include:
>>>
>>>
>>>    -
>>>
>>>    Interactive UI (includes a workspace with notebooks, dashboards, a
>>>    job scheduler, point-and-click cluster management)
>>>    -
>>>
>>>    Cluster sharing (multiple users can connect to the same cluster,
>>>    saving cost)
>>>    -
>>>
>>>    Security features (access controls to the whole workspace)
>>>    -
>>>
>>>    Collaboration (multi-user access to the same notebook, revision
>>>    control, and IDE and GitHub integration)
>>>    -
>>>
>>>    Data management (support for connecting different data sources to
>>>    Spark, caching service to speed up queries)
>>>
>>>
>>> The idea is that a lot of Spark deployments soon need to bring in
>>> multiple users, different types of jobs, etc, and we want to have these
>>> built-in. But if you just want to connect to existing data and run jobs,
>>> that also works.
>>>
>>> The cluster manager in Databricks is based on Standalone mode, not YARN,
>>> but Databricks adds several features, such as allowing multiple users to
>>> run commands on the same cluster and running multiple versions of Spark.
>>> Because Databricks is also the team that initially built Spark, the service
>>> is very up to date and integrated with the newest Spark features -- e.g.
>>> you can run previews of the next release, any data in Spark can be
>>> displayed visually, etc.
>>>
>>> *From: *Alex Nastetsky <[email protected]>
>>> *Subject: **Databricks Cloud vs AWS EMR*
>>> *Date: *January 26, 2016 at 11:55:41 AM PST
>>> *To: *user <[email protected]>
>>>
>>> As a user of AWS EMR (running Spark and MapReduce), I am interested in
>>> potential benefits that I may gain from Databricks Cloud. I was wondering
>>> if anyone has used both and done comparison / contrast between the two
>>> services.
>>>
>>> In general, which resource manager(s) does Databricks Cloud use for
>>> Spark? If it's YARN, can you also run MapReduce jobs in Databricks Cloud?
>>>
>>> Thanks.
>>>
>>> --
>>
>>
>>
>

Re: Databricks Cloud vs AWS EMR

Reply via email to