The fundamental conceptual difference between the windowing in DStream vs Structured Streaming is that DStream used the arrival time of the record in Spark (aka processing time) and Structured Streaming using event time. If you want to exactly replicate DStream's processing time windows in Structured Streaming, then you an just add the current timestamp as an additional column in the DataFrame and group by using that.
streamingDF .withColumn("processing_time", current_timestamp()) .groupBy($"key", window($"processing_time", "5 minutes")) .agg(sum($"value") as "total") On Thu, Jun 28, 2018 at 2:24 AM, Gerard Maas <gerard.m...@gmail.com> wrote: > Hi, > > In Structured Streaming lingo, "ReduceByKeyAndWindow" would be a window > aggregation with a composite key. > Something like: > stream.groupBy($"key", window($"timestamp", "5 minutes")) > .agg(sum($"value") as "total") > > The aggregate could be any supported SQL function. > Is this what you are looking for? Otherwise, share your specific use case > to see how it could be implemented in Structured Streaming. > > kr, Gerard. > > On Thu, Jun 28, 2018 at 10:21 AM oripwk <ori....@gmail.com> wrote: > >> In Structured Streaming, there's the notion of event-time windowing: >> >> >> >> However, this is not quite similar to DStream's windowing operations: in >> Structured Streaming, windowing groups the data by fixed time-windows, and >> every event in a time window is associated to its group: >> >> >> And in DStreams it just outputs all the data according to a limited window >> in time (last 10 minutes for example). >> >> The question was asked also here >> <https://stackoverflow.com/questions/49821646/is-there- >> someway-to-do-the-eqivalent-of-reducebykeyandwindow-in-spark-structured> >> , if it makes it clearer. >> >> How the latter can be achieved in Structured Streaming? >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>