Hi Aljoscha,

Thanks for the quick response.
I've seen the Google Data Flow presentation @ Flink forward and understand the
concepts behind it (which are also supported by Flink).

I will further look into stack overflow and let you know if I have some further questions.

Once again, thanks,

Leonard

On 02-11-15 10:25, Aljoscha Krettek wrote:
Hi Leonard,
I’m afraid you might be thinking about windows as they are supported by Spark 
Streaming. There windows are quite limited. In Flink you don’t necessarily have 
to window elements by time since Flink does not collect data in mini-batches 
before processing. Everything is continuously processed and you can have 
arbitrary Trigger strategies that decide when you want to process windows.

The basic idea behind windowing in Flink is that elements are assigned to 
windows by a WindowAssigner and then a Trigger decides when to trigger 
computation for a specific window. This is very similar to the model employed 
by Google Cloud Dataflow, if you are familiar with that.

You could have a look at this Stackoverflow question and my answer to it: 
http://stackoverflow.com/questions/33451121/apache-flink-session-support. It 
could be similar to your use case.

Please let me know if you want a more in-depth explanation of the windowing 
system. It is a quite recent addition and arguably the most complex part in any 
streaming system.

Cheers,
Aljoscha
On 02 Nov 2015, at 09:15, Leonard Wolters <leon...@sagent.io> wrote:

Hi,

I was wondering if Flink already has implemented some sort of consistent keyBy 
mapping over multiple windows.
The underlying idea is to 'sessionize' incoming events over time (i.e. multiple 
streaming windows) on the same
partitions. As one can understand I want to avoid heavy shuffling over the 
network.

As far as I could read / understand from the API docs and the blogs on 
data-artisans, I can perfectly groupBy events
within one (time) window that are automatically distributed over all 
partitions. However, since sessions often exceed
the window submit duration, I want some guarantees that events belonging to the 
same session are delivered to the
same partition for new windows.

Is this possible or are they any plans to support this soon?

Thanks in advance,

Leonard

--
Leonard Wolters
Chief Product Manager
<logo.png>
M: +31 (0)6 55 53 04 01 | T: +31 (0)88 10 44 555
E: leon...@sagent.io | W: sagent.io | Disclaimer | Sagent BV
Herengracht 504 | 1017CB Amsterdam | Netherlands

--
Leonard Wolters
Chief Product Manager
        
*M*: +31 (0)6 55 53 04 01 | *T*: +31 (0)88 10 44 555
*E*: leon...@sagent.io <mailto:leon...@sagent.io> | *W*: sagent.io <http://sagent.io> | Disclaimer <http://sagent.io/email-disclaimer> | Sagent BV
Herengracht 504 | 1017CB Amsterdam | Netherlands

Reply via email to