Re: How to reduceByKeyAndWindow in Structured Streaming?

Tathagata Das Thu, 28 Jun 2018 13:38:12 -0700

The fundamental conceptual difference between the windowing in DStream vs
Structured Streaming is that DStream used the arrival time of the record in
Spark (aka processing time) and Structured Streaming using event time. If
you want to exactly replicate DStream's processing time windows in
Structured Streaming, then you an just add the current timestamp as an
additional column in the DataFrame and group by using that.


streamingDF
    .withColumn("processing_time", current_timestamp())
    .groupBy($"key", window($"processing_time", "5 minutes"))
    .agg(sum($"value") as "total")


On Thu, Jun 28, 2018 at 2:24 AM, Gerard Maas <gerard.m...@gmail.com> wrote:

> Hi,
>
> In Structured Streaming lingo, "ReduceByKeyAndWindow" would be a window
> aggregation with a composite key.
> Something like:
> stream.groupBy($"key", window($"timestamp", "5 minutes"))
>            .agg(sum($"value") as "total")
>
> The aggregate could be any supported SQL function.
> Is this what you are looking for? Otherwise, share your specific use case
> to see how it could be implemented in Structured Streaming.
>
> kr, Gerard.
>
> On Thu, Jun 28, 2018 at 10:21 AM oripwk <ori....@gmail.com> wrote:
>
>> In Structured Streaming, there's the notion of event-time windowing:
>>
>>
>>
>> However, this is not quite similar to DStream's windowing operations: in
>> Structured Streaming, windowing groups the data by fixed time-windows, and
>> every event in a time window is associated to its group:
>>
>>
>> And in DStreams it just outputs all the data according to a limited window
>> in time (last 10 minutes for example).
>>
>> The question was asked also  here
>> <https://stackoverflow.com/questions/49821646/is-there-
>> someway-to-do-the-eqivalent-of-reducebykeyandwindow-in-spark-structured>
>> , if it makes it clearer.
>>
>> How the latter can be achieved in Structured Streaming?
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

Re: How to reduceByKeyAndWindow in Structured Streaming?

Reply via email to