Re: Incremental updates

2016-05-30 Thread Aljoscha Krettek
Hi, the state will be kept indefinitely but we are planning to introduce a setting that would allow setting a time-to-live on state. I think this is exactly what you would need. As an alternative, maybe you could implement your program using windows? In this way you would also bound how long state

Re: Incremental updates

2016-05-27 Thread Malgorzata Kudelska
Hi, If I specify the userId as the key variable as you suggested, will the state variables be kept for every observed value of the key? I have a situation where I have a lot of userIds and many of them occure just once, so I don't want to keep the state for them for ever. I need the possibility to

Re: Incremental updates

2016-05-26 Thread Aljoscha Krettek
Hi, newly added nodes would sit idle, yes. Only when we finish the rescaling work mentioned in the link will we be able to dynamically adapt. The internal implementation of this will in fact hash keys to a larger number of partitions than the number of individual partitions and use these "key grou

Re: Incremental updates

2016-05-26 Thread Malgorzata Kudelska
Hi, So is there any possibility to utilize an extra node that joins the cluster or will it remain idle? What if I use a custom key function that matches the key variable to a number of keys bigger than the initial number of nodes (following the idea from your link)? What about running flink on yarn

Re: Incremental updates

2016-05-25 Thread Aljoscha Krettek
Hi, first question: are you manually keying by "userId % numberOfPartitions"? Flink internally does roughly "key.hash() % numPartitions" so it is enough to specify the userId as your key. Now, for you questions: 1. What Flink guarantees is that the state for a key k is always available when an el

Re: Incremental updates

2016-05-25 Thread Malgorzata Kudelska
Hi, I have the following situation. - a keyed stream with a key defined as: userId % numberOfPartitions - a custom flatMap transformation where I use a StateValue variable to keep the state of some calculations for each userId - my questions are: 1. Does flink guarantee that the users with a given

Re: Incremental updates

2016-05-25 Thread Aljoscha Krettek
Hi, right now, this does not work but we're is also actively working on that. This is the design doc for part one of the necessary changes: https://docs.google.com/document/d/1G1OS1z3xEBOrYD4wSu-LuBCyPUWyFd9l3T9WyssQ63w/edit?usp=sharing Cheers, Aljoscha On Wed, 25 May 2016 at 13:32 Malgorzata Kud

Re: Incremental updates

2016-05-25 Thread Malgorzata Kudelska
Hi, Thanks for your reply. Is Flink able to detect that an additional server joined and rebalance the processing? How is it done if I have a keyed stream and some custom ValueState variables? Cheers, Gosia 2016-05-25 11:32 GMT+02:00 Aljoscha Krettek : > Hi Gosia, > right now, Flink is not doing

Re: Incremental updates

2016-05-25 Thread Aljoscha Krettek
Hi Gosia, right now, Flink is not doing incremental checkpoints. Every checkpoint is fully valid in isolation. Incremental checkpointing came up several times on ML discussions and we a planning to work on it once someone finds some free time. Cheers, Aljoscha On Wed, 25 May 2016 at 09:29 Rubén C

Re: Incremental updates

2016-05-25 Thread Rubén Casado
Hi Gosia You can have a look to the PROTEUS project we are doing [1]. We are implementing incremental version of analytics operations. For example you can see in [2] the implementation of the incremental AVG. Maybe the code can give you some ideas :-) [1] https://github.com/proteus-h2020/pr