Hi Niels, If I understand your use case correctly, you'd like to hold back all events of a session until it ends/timesout and then write all events out. So, instead of aggregating per session (the common use case), you'd just like to collect the event.
I would implement a simple WindowFunction that just forwards all events that it receives from the iterator. Conceptually, the window will just collect the events and emit them when the session ended/timedout. Then you can add BucketingSink which writes out the events. I'm not sure if the BucketingSInk supports buckets based on event-time though. Maybe you would need to adapt it a bit to guarantee that all rows of the same session are written to the same file. Alternatively, the WindowFunction could also emit one large record which is a List or Array of events belonging to the same session. Best, Fabian 2017-06-30 13:29 GMT+02:00 Niels Basjes <ni...@basjes.nl>: > Hi, > > I have the following situation: > - I have click stream data from a website that contains a session id. > - I then use something like EventTimeSessionWindows to group these events > per session into a WindowedStream > > So I essentially end up with a stream of "finished sessions" > > So far I am able to do this fine. > > I then want to put these "finished sessions" in (parquet) files where I > want to have files with the sessions that ENDED (or the timeout of the gap > occurred) in a similar timeframe. > > So these files should be created every 5 minutes (or so) and contain all > events of the sessions that ended/timedout in the specified 5 minutes. > > What I ran into is that the WindowsStream doesn't accept a sink so simply > creating a BucketingSink and write the data doesn't work. > > Can anyone please give me some pointers on how to do this correctly? > > Thanks. > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >