StefanRRichter edited a comment on issue #7009: [FLINK-10712] Support to 
restore state when using RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/7009#issuecomment-453954020
 
 
   @Myasuka Yes, in the current implementation union state is a problem and 
unfortunately it is also used in some popular operators. For example, 
KafkaConsumer "abuses" union state to have a rescaling protocol that can 
support partition discovery. In a nutshell, during restore all parallel 
instances see the offsets of all partitions and every instance will cherrypick 
partitions through a protocoll that all instances follow. So every partition 
will go to exactly one operator and there is no need for communication between 
instances for that.
   
   The contract of union state is that in recovery, each operator instance sees 
the union of states from all instances. The question for partial recovery is 
now, will recoverying instances see i) all states, ii) all states from other 
recovering instances, or iii) only their old state. I think most likely, we 
should go for option i), but this also means that all states would have to go 
into the operation and cannot be excluded via the index.
   
   Then there is another problem in the current implementation that can lead to 
bugs with your code in the following way: if there is at least one union state, 
all other operator states will go through round-robin reassignment as well. So, 
we round-robin reassign some state, but only restarting operators load the 
reassigned version of the state. Instances that we keep running will run with 
the old assignment of the state. This can lead to some partitions beeing 
assigned twice or not being assigned at all. 
   
   One way to solve this problem would be to separate union states from other 
operator states and only round-robin assign operator state if the parallelism 
did not change (which it never does for recoveries, only for restarts).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to