[ https://issues.apache.org/jira/browse/FLINK-24162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410725#comment-17410725 ]
Roman Khachatryan commented on FLINK-24162: ------------------------------------------- Thanks for looking into it [~gaoyunhaii] . I can confirm that the task transitions to FINISHED twice: before and after a failure: {code:java} 23:16:57,760 INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Custom Source -> Timestamps/Watermarks -> transform-1-forward -> Sink: Unnamed (1/4)#0 (4a3e2cd18c9e79b42dc8d6624fcbcde8) switched from RUNNING to FINISHED. ... 23:16:57,837 [ Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 3 (type=CHECKPOI NT) @ 1630711017835 for job 3d9486075a07c60f7d6927cff31ab0db. 23:16:57,840 [jobmanager-io-thread-18] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 3 for job 3d94 86075a07c60f7d6927cff31ab0db (0 bytes, checkpointDuration=5 ms, finalizationTime=0 ms). 23:16:57,849 [Source: Custom Source -> Timestamps/Watermarks -> transform-1-forward -> Sink: Unnamed (3/4)#0] WARN org.apache.flink.runtime.taskm anager.Task [] - Source: Custom Source -> Timestamps/Watermarks -> transform-1-forward -> Sink: Unnamed (3/4)#0 (f8498b498d21de 0ce1edd1175a20e5a6) switched from RUNNING to FAILED with failure cause: java.lang.RuntimeException: requested to fail at org.apache.flink.runtime.operators.lifecycle.graph.TestEventSource.run(TestEventSource.java:82) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:116) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:73) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:323) ... 23:16:58,357 INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Custom Source -> Timestamps/Watermarks -> transform-1-forward -> Sink: Unnamed (1/4)#1 (4c131c07267e65d0365a4f2db71f41dc) switched from RUNNING to FINISHED. {code} There is a checkpoint (3) that is completed after finishing and is used for recovery. You're right that the whole job is restarted. However, shouldn't it be always the case because? TestJobBuilders#prepareEnv sets: {code:java} configuration.set(EXECUTION_FAILOVER_STRATEGY, "full"); {code} > PartiallyFinishedSourcesITCase fails due to assertion error in > DrainingValidator.validateOperatorLifecycle > ---------------------------------------------------------------------------------------------------------- > > Key: FLINK-24162 > URL: https://issues.apache.org/jira/browse/FLINK-24162 > Project: Flink > Issue Type: Bug > Components: API / DataStream > Affects Versions: 1.14.0, 1.15.0 > Reporter: Xintong Song > Priority: Blocker > Labels: test-stability > Fix For: 1.14.0, 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23526&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=4639 > {code} > Sep 03 23:17:11 [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, > Time elapsed: 19.233 s <<< FAILURE! - in > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase > Sep 03 23:17:11 [ERROR] test[simple graph SINGLE_SUBTASK, failover: true] > Time elapsed: 2.27 s <<< FAILURE! > Sep 03 23:17:11 java.lang.AssertionError > Sep 03 23:17:11 at org.junit.Assert.fail(Assert.java:87) > Sep 03 23:17:11 at org.junit.Assert.assertTrue(Assert.java:42) > Sep 03 23:17:11 at org.junit.Assert.assertFalse(Assert.java:65) > Sep 03 23:17:11 at org.junit.Assert.assertFalse(Assert.java:75) > Sep 03 23:17:11 at > org.apache.flink.runtime.operators.lifecycle.validation.DrainingValidator.validateOperatorLifecycle(DrainingValidator.java:56) > Sep 03 23:17:11 at > org.apache.flink.runtime.operators.lifecycle.validation.TestOperatorLifecycleValidator.lambda$checkOperatorsLifecycle$1(TestOperatorLifecycleValidator.java:52) > Sep 03 23:17:11 at java.util.HashMap.forEach(HashMap.java:1289) > Sep 03 23:17:11 at > org.apache.flink.runtime.operators.lifecycle.validation.TestOperatorLifecycleValidator.checkOperatorsLifecycle(TestOperatorLifecycleValidator.java:47) > Sep 03 23:17:11 at > org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:94) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)