Ben, I'm just a Spark user - but at least in March Spark Summit, that was the main term used. Taking a step back from the details, maybe this new post from Reynold is a better intro to Spark 2.0 highlights.... https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
If you want to drill down, go to SPARK-8360 "Structured Streaming (aka Streaming DataFrames)". The design doc (written by Reynold in March) is very readable: https://issues.apache.org/jira/browse/SPARK-8360 Regarding directly querying (SQL) the state managed by a streaming process - I don't know if that will land in 2.0 or only later. Hope that helps, Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Sun, May 15, 2016 at 11:58 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > Hi Ofir, > > I just recently saw the webinar with Reynold Xin. He mentioned the Spark > Session unification efforts, but I don’t remember the DataSet for > Structured Streaming aka Continuous Applications as he put it. He did > mention streaming or unlimited DataFrames for Structured Streaming so one > can directly query the data from it. Has something changed since then? > > Thanks, > Ben > > > On May 15, 2016, at 1:42 PM, Ofir Manor <ofir.ma...@equalum.io> wrote: > > Hi Yuval, > let me share my understanding based on similar questions I had. > First, Spark 2.x aims to replace a whole bunch of its APIs with just two > main ones - SparkSession (replacing Hive/SQL/Spark Context) and Dataset > (merging of Dataset and Dataframe - which is why it inherits all the > SparkSQL goodness), while RDD seems as a low-level API only for special > cases. The new Dataset should also support both batch and streaming - > replacing (eventually) DStream as well. See the design docs in SPARK-13485 > (unified API) and SPARK-8360 (StructuredStreaming) for a good intro. > However, as you noted, not all will be fully delivered in 2.0. For > example, it seems that streaming from / to Kafka using StructuredStreaming > didn't make it (so far?) to 2.0 (which is a showstopper for me). > Anyway, as far as I understand, you should be able to apply stateful > operators (non-RDD) on Datasets (for example, the new event-time window > processing SPARK-8360). The gap I see is mostly limited streaming sources / > sinks migrated to the new (richer) API and semantics. > Anyway, I'm pretty sure once 2.0 gets to RC, the documentation and > examples will align with the current offering... > > > Ofir Manor > > Co-Founder & CTO | Equalum > > Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io > > On Sun, May 15, 2016 at 1:52 PM, Yuval.Itzchakov <yuva...@gmail.com> > wrote: > >> I've been reading/watching videos about the upcoming Spark 2.0 release >> which >> brings us Structured Streaming. One thing I've yet to understand is how >> this >> relates to the current state of working with Streaming in Spark with the >> DStream abstraction. >> >> All examples I can find, in the Spark repository/different videos is >> someone >> streaming local JSON files or reading from HDFS/S3/SQL. Also, when >> browsing >> the source, SparkSession seems to be defined inside org.apache.spark.sql, >> so >> this gives me a hunch that this is somehow all related to SQL and the >> likes, >> and not really to DStreams. >> >> What I'm failing to understand is: Will this feature impact how we do >> Streaming today? Will I be able to consume a Kafka source in a streaming >> fashion (like we do today when we open a stream using KafkaUtils)? Will we >> be able to do state-full operations on a Dataset[T] like we do today using >> MapWithStateRDD? Or will there be a subset of operations that the catalyst >> optimizer can understand such as aggregate and such? >> >> I'd be happy anyone could shed some light on this. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Structured-Streaming-in-Spark-2-0-and-DStreams-tp26959.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com >> <http://nabble.com>. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > >