Re: Writing groups of Windows to files

2017-07-04 Thread Niels Basjes
I think I understand. Since the entrie session must fit into memory anyway I'm going to try to create a new datastructure which simply contains the 'entire session' and simply use a Window/BucketingSink construct to ship them to files. I do need to ensure noone can OOM the system by capping the ma

Re: Writing groups of Windows to files

2017-07-04 Thread Fabian Hueske
You are right. Building a large record might result in an OOME. But I think, that would also happen with a regular SessionWindow, RocksDBStatebackend, and a WindowFunction that immediately ships the records it receives from the Iterable. As far as I know, a SessionWindow stores all elements in an i

Re: Writing groups of Windows to files

2017-07-04 Thread Niels Basjes
Hi Fabian, On Fri, Jun 30, 2017 at 6:27 PM, Fabian Hueske wrote: > If I understand your use case correctly, you'd like to hold back all > events of a session until it ends/timesout and then write all events out. > So, instead of aggregating per session (the common use case), you'd just > like to

Re: Writing groups of Windows to files

2017-06-30 Thread Fabian Hueske
Hi Niels, If I understand your use case correctly, you'd like to hold back all events of a session until it ends/timesout and then write all events out. So, instead of aggregating per session (the common use case), you'd just like to collect the event. I would implement a simple WindowFunction th

Re: Writing groups of Windows to files

2017-06-30 Thread Nico Kruber
Looks like you are missing a window *function* that processes the window. >From [1] : stream .keyBy(...) <- keyed versus non-keyed windows .window(...) <- required: "assigner" [.trigger(...)] <- optional: "trigger" (else default trigger) [.evicto

Writing groups of Windows to files

2017-06-30 Thread Niels Basjes
Hi, I have the following situation: - I have click stream data from a website that contains a session id. - I then use something like EventTimeSessionWindows to group these events per session into a WindowedStream So I essentially end up with a stream of "finished sessions" So far I am able to d