Re: Spark vs Google cloud dataflow

Khanderao Kand Fri, 27 Jun 2014 15:17:31 -0700

DataFlow is based on two papers, MillWheel for Stream processing and
FlumeJava for programming optimization and abstraction.


Millwheel http://research.google.com/pubs/pub41378.html
FlumeJava http://dl.acm.org/citation.cfm?id=1806638

Here is my blog entry on this
http://texploration.wordpress.com/2014/06/26/google-dataflow-service-to-fight-against-amazon-kinesis/




On Fri, Jun 27, 2014 at 5:16 AM, Sean Owen <so...@cloudera.com> wrote:

> On Thu, Jun 26, 2014 at 9:15 AM, Aureliano Buendia <buendia...@gmail.com>
> wrote:
> > Summingbird is for map/reduce. Dataflow is the third generation of
> google's
> > map/reduce, and it generalizes map/reduce the way Spark does. See more
> about
> > this here: http://youtu.be/wtLJPvx7-ys?t=2h37m8s
>
> Yes, my point was that Summingbird is similar in that it is a
> higher-level service for batch/streaming computation, not that it is
> similar for being MapReduce-based.
>
> > It seems Dataflow is based on this paper:
> > http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf
>
> FlumeJava maps to Crunch in the Hadoop ecosystem. I think Dataflows is
> more than that but yeah that seems to be some of the 'language'. It is
> similar in that it is a distributed collection abstraction.
>

Re: Spark vs Google cloud dataflow

Reply via email to