subject:"Newbie question\: what makes Spark run faster than MapReduce"

Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Corey Nolet

1) Spark only needs to shuffle when data needs to be partitioned around the workers in an all-to-all fashion. 2) Multi-stage jobs that would normally require several map reduce jobs, thus causing data to be dumped to disk between the jobs can be cached in memory.

Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Hien Luu

This blog outlines a few things that make Spark faster than MapReduce - https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html On Fri, Aug 7, 2015 at 9:13 AM, Muler wrote: > Consider the classic word count application over a 4 node cluster with a > sizable working data. What makes Spark

Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Muler

Consider the classic word count application over a 4 node cluster with a sizable working data. What makes Spark ran faster than MapReduce considering that Spark also has to write to disk during shuffle?