DataFlow is based on two papers, MillWheel for Stream processing and FlumeJava for programming optimization and abstraction.
Millwheel http://research.google.com/pubs/pub41378.html FlumeJava http://dl.acm.org/citation.cfm?id=1806638 Here is my blog entry on this http://texploration.wordpress.com/2014/06/26/google-dataflow-service-to-fight-against-amazon-kinesis/ On Fri, Jun 27, 2014 at 5:16 AM, Sean Owen <so...@cloudera.com> wrote: > On Thu, Jun 26, 2014 at 9:15 AM, Aureliano Buendia <buendia...@gmail.com> > wrote: > > Summingbird is for map/reduce. Dataflow is the third generation of > google's > > map/reduce, and it generalizes map/reduce the way Spark does. See more > about > > this here: http://youtu.be/wtLJPvx7-ys?t=2h37m8s > > Yes, my point was that Summingbird is similar in that it is a > higher-level service for batch/streaming computation, not that it is > similar for being MapReduce-based. > > > It seems Dataflow is based on this paper: > > http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf > > FlumeJava maps to Crunch in the Hadoop ecosystem. I think Dataflows is > more than that but yeah that seems to be some of the 'language'. It is > similar in that it is a distributed collection abstraction. >