Re: [Spark Streaming] Session based windowing like in google dataflow

2015-08-07 Thread Tathagata Das
You can use Spark Streaming's updateStateByKey to do arbitrary sessionization. See the example - https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala All it does is store a single number (count of each word seeing sin

[Spark Streaming] Session based windowing like in google dataflow

2015-08-07 Thread Ankur Chauhan
Hi all, I am trying to figure out how to perform equivalent of "Session windows" (as mentioned in https://cloud.google.com/dataflow/model/windowing) using spark streaming. Is it even possible (i.e. possible to do efficiently at scale). Just to expand on the definition: Taken from the google da