Re: Operational concerns with state (was Re: Window limitations on groupBy)

2017-01-19 Thread Raman Gupta
Thank you Fabian, the blog articles were very useful. I will continue experimenting. On Thu, Jan 19, 2017 at 3:36 PM, Fabian Hueske wrote: > Hi Raman, > > Checkpoints are used to recover from task or process failures and usually > automatically taken at periodic intervals if configured correctly.

Re: Operational concerns with state (was Re: Window limitations on groupBy)

2017-01-19 Thread Fabian Hueske
Hi Raman, Checkpoints are used to recover from task or process failures and usually automatically taken at periodic intervals if configured correctly. Checkpoints are usually removed when a more recent checkpoint is completed (the exact policy can be configured). Savepoints are used to restart a

Operational concerns with state (was Re: Window limitations on groupBy)

2017-01-19 Thread Raman Gupta
I was able to get it working well with the original approach you described. Thanks! Note that the documentation on how to do this with the Java API is... sparse, to say the least. I was able to look at the implementation of the scala flatMapWithState function as a starting point. Now I'm trying to

Re: Window limitations on groupBy

2017-01-19 Thread Fabian Hueske
Hi Raman, I think you would need a sliding count window of size 2 with slide 1. This is basically a GlobalWindow with a special trigger. However, you would need to modify the custom trigger to be able to - identify a terminal event (if there is such a thing) or to - close the window after a certa

Re: Window limitations on groupBy

2017-01-18 Thread Raman Gupta
Thank you for your reply. If I were to use a keyed stream with a count-based window of 2, would Flink keep the last state persistently until the next state is received? Would this be another way of having Flink keep this information persistently without having to implement it manually? Thanks, Ra

Re: Window limitations on groupBy

2017-01-18 Thread Fabian Hueske
Hi Raman, I would approach this issues as follows. You key the input stream on the sourceId and apply a stateful FlatMapFunction. The FlatMapFunction has a key-partioned state and stores for each key (sourceId) the latest event as state. When a new event arrives, you can compute the time spend in

Window limitations on groupBy

2017-01-18 Thread Raman Gupta
I am investigating Flink. I am considering a relatively simple use case -- I want to ingest streams of events that are essentially timestamped state changes. These events may look something like: { sourceId: 111, state: OPEN, timestamp: } I want to apply various processing to these state c