On Thu, Jun 26, 2014 at 9:15 AM, Aureliano Buendia <[email protected]> wrote: > Summingbird is for map/reduce. Dataflow is the third generation of google's > map/reduce, and it generalizes map/reduce the way Spark does. See more about > this here: http://youtu.be/wtLJPvx7-ys?t=2h37m8s
Yes, my point was that Summingbird is similar in that it is a higher-level service for batch/streaming computation, not that it is similar for being MapReduce-based. > It seems Dataflow is based on this paper: > http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf FlumeJava maps to Crunch in the Hadoop ecosystem. I think Dataflows is more than that but yeah that seems to be some of the 'language'. It is similar in that it is a distributed collection abstraction.
