Hi, I was playing with it more today and I think I've found a workaround.
So what I do: 1. I define a constant N logical groups 2. I use consistent hash mapping of data keys to these groups 3. I map these groups to partitions using even distribution (same as Flink distributes state) 4. In a stateful function I'm able to calculate wich groups are assigned to that partition and produce the right number of states for each groups (empty states too) 5. I do manual partitioning before that stateful function using same calculations with groups So far it looks like scaling up and down results in correct behavior. Can I rely on Flink distributing state evenly and in the order I return it in the list? Best regards, Dmitry On Tue, Feb 14, 2017 at 9:33 AM, Stefan Richter <s.rich...@data-artisans.com > wrote: > Hi, > > there is something that we call "raw keyed“ operator state, which might > exactly serve your purpose. It is already used internally by Flink’s window > operator, but there exists currently no public API for this feature. Way it > works currently is that you obtain input and output streams that are aware > of key-groups being written or read, but the API needs to consider the fact > that each key-group must be written only once and complete before the next > key-group can start. This is a bit tricky to expose for inheritance > hierarchies. My guess is that you can expect this for the next version of > Flink. > > Best, > Stefan > > Am 14.02.2017 um 08:31 schrieb Tzu-Li (Gordon) Tai <tzuli...@apache.org>: > > Hi Dmitry, > > Technically, from the looks of the internal code around > `OperatorStateRepartitioner`, I think it is certainly possible to be > pluggable. > Right now it is just hard coded to use a round-robin repartitioner > implementation as default. > > However, I’m not sure of the plans in exposing this to the user and making > it configurable. > Looping in Stefan (in cc) who mostly worked on this part and see if he can > provide more info. > > - Gordon > > On February 14, 2017 at 2:30:27 AM, Dmitry Golubets (dgolub...@gmail.com) > wrote: > > Hi, > > It looks impossible to implement a keyed state with operator state now. > > I know it sounds like "just use a keyed state", but latter requires > updating it on every value change as opposed to operator state and thus can > be expensive (especially if you have to deal with mutable structures inside > which have to be serialized). > > The problem is that there is no way to tell Flink how to reassign > savepoint parts between partitions, and thus impossible to route data to > correct partitions. > > Is there anything I missed or maybe a plan to implement it in future? > > Best regards, > Dmitry > > >