Can you name the features that make databricks better than zepplin? Eran On Fri, 29 Jan 2016 at 01:37 Michal Klos <[email protected]> wrote:
> We use both databricks and emr. We use databricks for our exploratory / > adhoc use cases because their notebook is pretty badass and better than > Zeppelin IMHO. > > We use EMR for our production machine learning and ETL tasks. The nice > thing about EMR is you can use applications other than spark. From a "tools > in the toolbox" perspective this is very important. > > M > > On Jan 28, 2016, at 6:05 PM, Sourav Mazumder <[email protected]> > wrote: > > You can also try out IBM's spark as a service in IBM Bluemix. You'll get > there all required features for security, multitenancy, notebook, > integration with other big data services. You can try that out for free too. > > Regards, > Sourav > > On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni <[email protected]> > wrote: > >> At its core, EMR just launches Spark applications, whereas Databricks is >>> a higher-level platform that also includes multi-user support, an >>> interactive UI, security, and job scheduling. >>> >>> Specifically, Databricks runs standard Spark applications inside a >>> user’s AWS account, similar to EMR, but it adds a variety of features to >>> create an end-to-end environment for working with Spark. These include: >>> >>> >>> - >>> >>> Interactive UI (includes a workspace with notebooks, dashboards, a >>> job scheduler, point-and-click cluster management) >>> - >>> >>> Cluster sharing (multiple users can connect to the same cluster, >>> saving cost) >>> - >>> >>> Security features (access controls to the whole workspace) >>> - >>> >>> Collaboration (multi-user access to the same notebook, revision >>> control, and IDE and GitHub integration) >>> - >>> >>> Data management (support for connecting different data sources to >>> Spark, caching service to speed up queries) >>> >>> >>> The idea is that a lot of Spark deployments soon need to bring in >>> multiple users, different types of jobs, etc, and we want to have these >>> built-in. But if you just want to connect to existing data and run jobs, >>> that also works. >>> >>> The cluster manager in Databricks is based on Standalone mode, not YARN, >>> but Databricks adds several features, such as allowing multiple users to >>> run commands on the same cluster and running multiple versions of Spark. >>> Because Databricks is also the team that initially built Spark, the service >>> is very up to date and integrated with the newest Spark features -- e.g. >>> you can run previews of the next release, any data in Spark can be >>> displayed visually, etc. >>> >>> *From: *Alex Nastetsky <[email protected]> >>> *Subject: **Databricks Cloud vs AWS EMR* >>> *Date: *January 26, 2016 at 11:55:41 AM PST >>> *To: *user <[email protected]> >>> >>> As a user of AWS EMR (running Spark and MapReduce), I am interested in >>> potential benefits that I may gain from Databricks Cloud. I was wondering >>> if anyone has used both and done comparison / contrast between the two >>> services. >>> >>> In general, which resource manager(s) does Databricks Cloud use for >>> Spark? If it's YARN, can you also run MapReduce jobs in Databricks Cloud? >>> >>> Thanks. >>> >>> -- >> >> >> >
