StefanRRichter edited a comment on issue #7009: [FLINK-10712] Support to 
restore state when using RestartPipelinedRegionStrategy
URL: https://github.com/apache/flink/pull/7009#issuecomment-452649422
 
 
   I have one more concern that might lead to bugs in a certain corner case. 
What will happen in your change if the task is using operator state, union 
state in particular. In `applyRepartitioner()`,
   
   
https://github.com/apache/flink/blob/1e2aa8e9f35e7a943a4ed56a47834ee50bab3b47/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/StateAssignmentOperation.java#L636
 
   
   you can see that all operator state is repartitioned if there is one union 
state. this can already be a problem with a union state, but even more if there 
is a union state and some partionable state -  the partitioning for the 
partitionable state for those tasks that are restarted could differ to the 
partitioning used in the original run - some partitions could be dropped or 
assigned twice by this. I think that means we need to change the method to only 
redistribute the union states, and I wonder if even distributioing the union 
state only for the failed task even makes sense. I think we need a testcase for 
this scenario (operator with 1 union and 1 partionable operator state)and I 
think it might fail as described before when we check how the operator state 
was reassigned after some partial recovery. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to