Thanks Sean. adding user@spark.apache.org again.
On Sat, Nov 22, 2014 at 9:35 PM, Sean Owen <so...@cloudera.com> wrote: > On Sun, Nov 23, 2014 at 2:20 AM, Soumya Simanta > <soumya.sima...@gmail.com> wrote: > > Is the MapReduce API "simpler" or the implementation? Almost, every Spark > > presentation has a slide that shows 100+ lines of Hadoop MR code in Java > and > > the same feature implemented in 3 lines of Scala code on Spark. So the > Spark > > API is certainly simpler, at least based on what I know. What am I > missing > > here? > > The implementation is simpler. The API is not. However I don't think > anyone 'really' uses the M/R API directly now. They use Crunch or > maybe Cascading. These are also much less than 100 lines for word > count, on top of M/R. > > > Can you please expand on what you mean by "efficient" ? Better > performance > > and/or reliability, fewer resources or something else? > > All of the above. Map/Reduce is simple and easy to understand, and > Spark is actually hard to reason about, and heavy-weight. Of course, > as soon as your work spans more than one MapReduce, this reasoning > changes a lot. But MapReduce is better for truly map-only, or > map-with-a-reduce-only, workloads. It is optimized for this case. The > shuffle is still better. >