XComp commented on a change in pull request #14847:
URL: https://github.com/apache/flink/pull/14847#discussion_r581941734



##########
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/DefaultSchedulerTest.java
##########
@@ -541,6 +544,59 @@ public void handleGlobalFailure() {
         assertThat(deployedExecutionVertices, contains(executionVertexId, 
executionVertexId));
     }
 
+    /**
+     * This test covers the use-case where a global fail-over is followed by a 
local task failure.
+     * It verifies (besides checking the expected deployments) that the assert 
in the global
+     * recovery handling of {@link SchedulerBase#restoreState} is not 
triggered due to version
+     * updates.
+     */
+    @Test
+    public void handleGlobalFailureWithLocalFailure() {

Review comment:
       We cannot remove [assert in 
SchedulerBase.restoreState](https://github.com/XComp/flink/blob/7cbd97f815d3bfd4715ddd8cb1d88f92f05a282a/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/SchedulerBase.java#L419)
 as the following operation relies on all vertices being passed for restoring 
the state. Accepting less `ExecutionJobVertices` would mean that not all tasks 
would get restored.
   
   I added this test to verify the case where a local task failure proceeds a 
global fail-over before actually restarting the tasks. The local task failure 
would not trigger a version update because of the 
[isNotifiable](https://github.com/XComp/flink/blob/7cbd97f815d3bfd4715ddd8cb1d88f92f05a282a/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/SchedulerBase.java#L654)
 check in `SchedulerBase.updateTaskExecutionState`. This will only return 
`true` (and trigger the internal task state change) if the corresponding 
`ExecutionVertex` is either in state `FINISHED` or `FAILED`. The previous 
global fail-over has switched the state already to `CANCELING`. The local task 
failure will finally switch the state to `CANCELED`. Hence, no internal task 
state change is processed for the local task failure. No version upgrade is 
triggered and the global fail-over restart restarts the all vertices as 
expected.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to