Hi Niels,

If I understand your use case correctly, you'd like to hold back all events
of a session until it ends/timesout and then write all events out.
So, instead of aggregating per session (the common use case), you'd just
like to collect the event.

I would implement a simple WindowFunction that just forwards all events
that it receives from the iterator. Conceptually, the window will just
collect the events and emit them when the session ended/timedout.
Then you can add BucketingSink which writes out the events. I'm not sure if
the BucketingSInk supports buckets based on event-time though. Maybe you
would need to adapt it a bit to guarantee that all rows of the same session
are written to the same file.
Alternatively, the WindowFunction could also emit one large record which is
a List or Array of events belonging to the same session.

Best,
Fabian

2017-06-30 13:29 GMT+02:00 Niels Basjes <ni...@basjes.nl>:

> Hi,
>
> I have the following situation:
> - I have click stream data from a website that contains a session id.
> - I then use something like EventTimeSessionWindows to group these events
> per session into a WindowedStream
>
> So I essentially end up with a stream of "finished sessions"
>
> So far I am able to do this fine.
>
> I then want to put these "finished sessions" in (parquet) files where I
> want to have files with the sessions that ENDED (or the timeout of the gap
> occurred) in a similar timeframe.
>
> So these files should be created every 5 minutes (or so) and contain all
> events of the sessions that ended/timedout in the specified 5 minutes.
>
> What I ran into is that the WindowsStream doesn't accept a sink so simply
> creating a BucketingSink and write the data doesn't work.
>
> Can anyone please give me some pointers on how to do this correctly?
>
> Thanks.
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Reply via email to