Re: Fault tolerance & idempotency on window functions

2017-05-04 Thread Aljoscha Krettek
Hi, When keying, keep in mind that Kafka and Flink might use a different scheme for hashing. For example, Flink also applies a murmur hash on the hash code retrieved from the key and then has some internal logic for assigning that hash to a key group (the internal unit of key partitioning). I do

Re: Fault tolerance & idempotency on window functions

2017-04-29 Thread Kamil Dziublinski
Big thanks for replying Aljoscha, I spend quite some time on thinking how to solve this problem and came to some conclusions. Would be cool if you can verify if my logic is correct. I decided that if I will partition data in kafka in the same way as I partition my window with keyby. It's tenant, u

Re: Fault tolerance & idempotency on window functions

2017-04-28 Thread Aljoscha Krettek
Hi, Yes, your analysis is correct: Flink will not retry for individual elements but will restore from the latest consistent checkpoint in case of failure. This also means that you can get different window results based on which element arrives first, i.e. you have a different timestamp on your o

Fault tolerance & idempotency on window functions

2017-04-25 Thread Kamil Dziublinski
Hi guys, I have a flink streaming job that reads from kafka, creates some statistics increments and stores this in hbase (using normal puts). I'm using fold function here of with window of few seconds. My tests showed me that restoring state with window functions is not exactly working how I expe