You can use Spark Streaming's updateStateByKey to do arbitrary
sessionization.
See the example -
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala
All it does is store a single number (count of each word seeing sin
Hi all,
I am trying to figure out how to perform equivalent of "Session windows" (as
mentioned in https://cloud.google.com/dataflow/model/windowing) using spark
streaming. Is it even possible (i.e. possible to do efficiently at scale). Just
to expand on the definition:
Taken from the google da