Thank you, Amit! I was looking for this kind of information. I did not fully read your paper, I see in it a TODO with basically the same question(s) [1], maybe someone from Spark team (including Databricks) will be so kind to send some feedback..
Best, Ovidiu [1] Integrate “Structured Streaming”: //TODO - What (and how) will Spark 2.0 support (out-of-order, event-time windows, watermarks, triggers, accumulation modes) - how straight forward will it be to integrate with the Beam Model ? > On 21 May 2016, at 23:00, Sela, Amit <ans...@paypal.com> wrote: > > It seems I forgot to add the link to the “Technical Vision” paper so there it > is - > https://docs.google.com/document/d/1y4qlQinjjrusGWlgq-mYmbxRW2z7-_X5Xax-GG0YsC0/edit?usp=sharing > > From: "Sela, Amit" <ans...@paypal.com <mailto:ans...@paypal.com>> > Date: Saturday, May 21, 2016 at 11:52 PM > To: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr > <mailto:ovidiu-cristian.ma...@inria.fr>>, "user @spark" > <user@spark.apache.org <mailto:user@spark.apache.org>> > Cc: Ovidiu Cristian Marcu <ovidiu21ma...@gmail.com > <mailto:ovidiu21ma...@gmail.com>> > Subject: Re: What / Where / When / How questions in Spark 2.0 ? > > This is a “Technical Vision” paper for the Spark runner, which provides > general guidelines to the future development of Spark’s Beam support as part > of the Apache Beam (incubating) project. > This is our JIRA - > https://issues.apache.org/jira/browse/BEAM/component/12328915/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel > > <https://issues.apache.org/jira/browse/BEAM/component/12328915/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel> > > Generally, I’m currently working on Datasets integration for Batch (to > replace RDD) against Spark 1.6, and going towards enhancing Stream processing > capabilities with Structured Streaming (2.0) > > And you’re welcomed to ask those questions at the Apache Beam (incubating) > mailing list as well ;) > http://beam.incubator.apache.org/mailing_lists/ > <http://beam.incubator.apache.org/mailing_lists/> > > Thanks, > Amit > > From: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr > <mailto:ovidiu-cristian.ma...@inria.fr>> > Date: Tuesday, May 17, 2016 at 12:11 AM > To: "user @spark" <user@spark.apache.org <mailto:user@spark.apache.org>> > Cc: Ovidiu Cristian Marcu <ovidiu21ma...@gmail.com > <mailto:ovidiu21ma...@gmail.com>> > Subject: Re: What / Where / When / How questions in Spark 2.0 ? > > Could you please consider a short answer regarding the Apache Beam Capability > Matrix todo’s for future Spark 2.0 release [4]? (some related references > below [5][6]) > > Thanks > > [4] http://beam.incubator.apache.org/capability-matrix/#cap-full-what > <http://beam.incubator.apache.org/capability-matrix/#cap-full-what> > [5] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 > <https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101> > [6] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 > <https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102> > >> On 16 May 2016, at 14:18, Ovidiu-Cristian MARCU >> <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> >> wrote: >> >> Hi, >> >> We can see in [2] many interesting (and expected!) improvements (promises) >> like extended SQL support, unified API (DataFrames, DataSets), improved >> engine (Tungsten relates to ideas from modern compilers and MPP databases - >> similar to Flink [3]), structured streaming etc. It seems we somehow assist >> at a smart unification of Big Data analytics (Spark, Flink - best of two >> worlds)! >> >> How does Spark respond to the missing What/Where/When/How questions >> (capabilities) highlighted in the unified model Beam [1] ? >> >> Best, >> Ovidiu >> >> [1] >> https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective >> >> <https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective> >> [2] >> https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html >> >> <https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html> >> [3] http://stratosphere.eu/project/publications/ >> <http://stratosphere.eu/project/publications/> >> >> >