Several of the Bigtop folks got together last week at ApacheCon, this was popular topic for next enhancements with spark related components after getting 1.0 out the door. Some leading topics were:
-deployment of spark specific clusters -spark standalone, hdfs -spark over yarn, hdfs -spark on mesos (talked to mesos folk about working to include in bigtop post 1.0) -the above plus variants of other bigtop components (ie: kafka, zeppelin, demo data generators) One thing group would like some help on is tests for spark environments so things can be validated post build/deploy and enhance CI process so if you choose to deploy via bigtop in test/prod/etc you know things have gone through a certain amount of rigor beforehand Nate -----Original Message----- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Tuesday, April 21, 2015 12:46 PM To: Nicholas Chammas Cc: Spark dev list Subject: Re: Is spark-ec2 for production use? It could be a good idea to document this a bit. The original goals were to give people an easy way to get started with Spark and also to provide a consistent environment for our own experiments and benchmarking of Spark at the AMPLab. Over time I've noticed a huge amount of scope increase in terms of what people want to do and I do know that many companies run production infrastructure based on launching the EC2 scripts. My feeling is that the general problem of deploying Spark with other applications and frameworks is fairly well covered by projects which specifically focus on packaging and automation (e.g. Whirr, BigTop, etc). So I'd like to see a narrower focus on just getting a vanilla Spark cluster up and running and make it clear that customization and extension of that functionality is really not in scope. This doesn't mean discouraging people from using it for production use cases, but more that they shouldn't expect us to merge and maintain things that seek to do broader integration with other technologies, automation, etc. - Patrick On Tue, Apr 21, 2015 at 12:05 PM, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > Is spark-ec2 intended for spinning up production Spark clusters? > > I think the answer is no. > > However, the docs for spark-ec2 > <https://spark.apache.org/docs/latest/ec2-scripts.html> very much > leave that possibility open, and indeed I see many people asking > questions or opening issues that stem from some production use case > they are trying to fit spark-ec2 to. > > Here's the latest example > <https://issues.apache.org/jira/browse/SPARK-6900?focusedCommentId=145 > 04236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tab > panel#comment-14504236> > of > someone using spark-ec2 to power their (presumably) production service. > > Shouldn't we actively discourage people from using spark-ec2 in this way? > > I understand there's no stopping people from doing what they want with > it, and certainly the questions and issues we receive about spark-ec2 > are still valid, even if they stem from discouraged use cases. > > From what I understand, spark-ec2 is intended for quick > experimentation, one-off jobs, prototypes, and so forth. > > If that's the case, it's best to stress this in the docs. > > Nick --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org