Several of the Bigtop folks got together last week at ApacheCon, this was
popular topic for next enhancements with spark related components after
getting 1.0 out the door.  Some leading topics were:

-deployment of spark specific clusters
     -spark standalone, hdfs
     -spark over yarn, hdfs
     -spark on mesos (talked to mesos folk about working to include in
bigtop post 1.0)
     -the above plus variants of other bigtop components (ie: kafka,
zeppelin, demo data generators)

One thing group would like some help on is tests for spark environments so
things can be validated post build/deploy and enhance CI process so if you
choose to deploy via bigtop in test/prod/etc you know things have gone
through a certain amount of rigor beforehand

Nate

-----Original Message-----
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Tuesday, April 21, 2015 12:46 PM
To: Nicholas Chammas
Cc: Spark dev list
Subject: Re: Is spark-ec2 for production use?

It could be a good idea to document this a bit. The original goals were to
give people an easy way to get started with Spark and also to provide a
consistent environment for our own experiments and benchmarking of Spark at
the AMPLab. Over time I've noticed a huge amount of scope increase in terms
of what people want to do and I do know that many companies run production
infrastructure based on launching the EC2 scripts.

My feeling is that the general problem of deploying Spark with other
applications and frameworks is fairly well covered by projects which
specifically focus on packaging and automation (e.g. Whirr, BigTop, etc). So
I'd like to see a narrower focus on just getting a vanilla Spark cluster up
and running and make it clear that customization and extension of that
functionality is really not in scope.

This doesn't mean discouraging people from using it for production use
cases, but more that they shouldn't expect us to merge and maintain things
that seek to do broader integration with other technologies, automation,
etc.

- Patrick

On Tue, Apr 21, 2015 at 12:05 PM, Nicholas Chammas
<nicholas.cham...@gmail.com> wrote:
> Is spark-ec2 intended for spinning up production Spark clusters?
>
> I think the answer is no.
>
> However, the docs for spark-ec2
> <https://spark.apache.org/docs/latest/ec2-scripts.html> very much 
> leave that possibility open, and indeed I see many people asking 
> questions or opening issues that stem from some production use case 
> they are trying to fit spark-ec2 to.
>
> Here's the latest example
> <https://issues.apache.org/jira/browse/SPARK-6900?focusedCommentId=145
> 04236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tab
> panel#comment-14504236>
> of
> someone using spark-ec2 to power their (presumably) production service.
>
> Shouldn't we actively discourage people from using spark-ec2 in this way?
>
> I understand there's no stopping people from doing what they want with 
> it, and certainly the questions and issues we receive about spark-ec2 
> are still valid, even if they stem from discouraged use cases.
>
> From what I understand, spark-ec2 is intended for quick 
> experimentation, one-off jobs, prototypes, and so forth.
>
> If that's the case, it's best to stress this in the docs.
>
> Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional
commands, e-mail: dev-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to