StefanRRichter edited a comment on issue #7009: [FLINK-10712] Support to restore state when using RestartPipelinedRegionStrategy URL: https://github.com/apache/flink/pull/7009#issuecomment-453954020 @Myasuka Yes, in the current implementation union state is a problem and unfortunately it is also used in some popular operators. For example, KafkaConsumer "abuses" union state to have a rescaling protocol that can support partition discovery. In a nutshell, during restore all parallel instances see the offsets of all partitions and every instance will cherrypick partitions through a protocoll that all instances follow. So every partition will go to exactly one operator and there is no need for communication between instances for that. The contract of union state is that in recovery, each operator instance sees the union of states from all instances. The question for partial recovery is now, will recoverying instances see i) all states, ii) all states from other recovering instances, or iii) only their old state. I think most likely, we should go for option i), but this also means that all states would have to go into the operation and cannot be excluded via the index. Then there is another problem in the current implementation that can lead to bugs with your code in the following way: if there is at least one union state, all other operator states will go through round-robin reassignment as well. So, we round-robin reassign some state, but only restarting operators load the reassigned version of the state. Instances that we keep running will run with the old assignment of the state. This can lead to some partitions beeing assigned twice or not being assigned at all. One way to solve this problem would be to separate union states from other operator states and only round-robin assign operator state if the parallelism did not change (which it never does for recoveries, only for restarts).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services