"The current problem with Spark is the big overhead and cost of bringing up a cluster. On a good day, it takes AWS spot instances 15 - 20 minutes to bring up a 30 node cluster. This makes it non-efficient for computations which may take only 10 - 15 minutes."
Hmm, this is a misleading message.The overhead of bringing up a AWS Spark spot instances is NOT the inherent problem of Spark. If you have a cluster that is already running, a Spark job can be started within ~100ms. Best, On Thu, Jun 26, 2014 at 7:15 AM, Aureliano Buendia <buendia...@gmail.com> wrote: > > > > On Thu, Jun 26, 2014 at 10:58 AM, Sean Owen <so...@cloudera.com> wrote: > >> My first reaction was that Dataflow mapped more to Summingbird, as part >> > > Summingbird is for map/reduce. Dataflow is the third generation of > google's map/reduce, and it generalizes map/reduce the way Spark does. See > more about this here: http://youtu.be/wtLJPvx7-ys?t=2h37m8s > > It seems Dataflow is based on this paper: > http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf > > The paper mentions a few times in-memory computation. But I'm not sure how > much Google's implementation resembles to Spark when it comes to in-memory > computation. > > The current problem with Spark is the big overhead and cost of bringing up > a cluster. On a good day, it takes AWS spot instances 15 - 20 minutes to > bring up a 30 node cluster. This makes it non-efficient for computations > which may take only 10 - 15 minutes. > > >> of it is a higher-level system for doing a specific thing in >> batch/streaming -- aggregations. >> >> On Wed, Jun 25, 2014 at 8:23 PM, Aureliano Buendia <buendia...@gmail.com> >> wrote: >> > Hi, >> > >> > Today Google announced their cloud dataflow, which is very similar to >> spark >> > in performing batch processing and stream processing. >> > >> > How does spark compare to Google cloud dataflow? Are they solutions >> trying >> > to aim the same problem? >> > >> > >> > > -- Michael B. Bui, PhD, Senior Software Architect, ADATAO Inc. www.adatao.com