Re: Spark or MR, Scala or Java?

Soumya Simanta Sat, 22 Nov 2014 18:46:45 -0800

Thanks Sean.

adding user@spark.apache.org again.


On Sat, Nov 22, 2014 at 9:35 PM, Sean Owen <so...@cloudera.com> wrote:

> On Sun, Nov 23, 2014 at 2:20 AM, Soumya Simanta
> <soumya.sima...@gmail.com> wrote:
> > Is the MapReduce API "simpler" or the implementation? Almost, every Spark
> > presentation has a slide that shows 100+ lines of Hadoop MR code in Java
> and
> > the same feature implemented in 3 lines of Scala code on Spark. So the
> Spark
> > API is certainly simpler, at least based on what I know. What am I
> missing
> > here?
>
> The implementation is simpler. The API is not. However I don't think
> anyone 'really' uses the M/R API directly now. They use Crunch or
> maybe Cascading. These are also much less than 100 lines for word
> count, on top of M/R.
>
> > Can you please expand on what you mean by "efficient" ? Better
> performance
> > and/or reliability,  fewer resources or something else?
>
> All of the above. Map/Reduce is simple and easy to understand, and
> Spark is actually hard to reason about, and heavy-weight. Of course,
> as soon as your work spans more than one MapReduce, this reasoning
> changes a lot. But MapReduce is better for truly map-only, or
> map-with-a-reduce-only, workloads. It is optimized for this case. The
> shuffle is still better.
>

Re: Spark or MR, Scala or Java?

Reply via email to