Simply put:

EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR API + 
Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any Instance 
Type

I use spark-ec2 for prototyping and I have never use it for production.

just my $0.02



> On Dec 1, 2015, at 11:15 AM, Nick Chammas <nicholas.cham...@gmail.com> wrote:
> 
> Pinging this thread in case anyone has thoughts on the matter they want to 
> share.
> 
> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email] 
> <x-msg://10/user/SendEmail.jtp?type=node&node=25538&i=0>> wrote:
> Spark has come bundled with spark-ec2 
> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many years. At the 
> same time, EMR has been capable of running Spark for a while, and earlier 
> this year it added "official" support 
> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
> 
> If you're looking for a way to provision Spark clusters, there are some clear 
> differences between these 2 options. I think the biggest one would be that 
> EMR is a "production" solution backed by a company, whereas spark-ec2 is not 
> really intended for production use (as far as I know).
> 
> That particular difference in intended use may or may not matter to you, but 
> I'm curious:
> 
> What are some of the other differences between the 2 that do matter to you? 
> If you were considering these 2 solutions for your use case at one point 
> recently, why did you choose one over the other?
> 
> I'd be especially interested in hearing about why people might choose 
> spark-ec2 over EMR, since the latter option seems to have shaped up nicely 
> this year.
> 
> Nick
> 
> 
> View this message in context: Re: spark-ec2 vs. EMR 
> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
> Sent from the Apache Spark User List mailing list archive 
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.

Reply via email to