[ https://issues.apache.org/jira/browse/FLINK-23233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378164#comment-17378164 ]
Stephan Ewen commented on FLINK-23233: -------------------------------------- Thanks, [~gaoyunhaii] for debugging this. Your analysis of the cause is correct. I commented on the Pull Request regarding the suggested solution. Regarding the Bugfix Release criticality: This condition here can only happen in conjunction with actual RPC loss, plus a very fast checkpoint interval. So this bug should be super rare. The original RPC loss fix was already fixing a rare issue, this here should be even rarer. Of course we should fix it asap, but I would expect that this will be hard to observe in practice, outside tests with very specific setups. So if this fix takes a bit, we may not want to block other more critical fixes on this. > OperatorEventSendingCheckpointITCase.testOperatorEventLostWithReaderFailure > fails on azure > ------------------------------------------------------------------------------------------ > > Key: FLINK-23233 > URL: https://issues.apache.org/jira/browse/FLINK-23233 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.14.0, 1.12.3, 1.13.1 > Reporter: Xintong Song > Assignee: Yun Gao > Priority: Blocker > Labels: pull-request-available > Fix For: 1.14.0, 1.12.5, 1.13.2 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19857&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=c2734c79-73b6-521c-e85a-67c7ecae9107&l=9382 > {code} > Jul 03 01:37:31 [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, > Time elapsed: 21.415 s <<< FAILURE! - in > org.apache.flink.runtime.operators.coordination.OperatorEventSendingCheckpointITCase > Jul 03 01:37:31 [ERROR] > testOperatorEventLostWithReaderFailure(org.apache.flink.runtime.operators.coordination.OperatorEventSendingCheckpointITCase) > Time elapsed: 3.623 s <<< FAILURE! > Jul 03 01:37:31 java.lang.AssertionError: expected:<[1, 2, 3, 4, 5, 6, 7, 8, > 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, > 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, > 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, > 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, > 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]> but > was:<[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, > 59, 60, 61, 62, 63, 64, 65, 66, 67]> > Jul 03 01:37:31 at org.junit.Assert.fail(Assert.java:88) > Jul 03 01:37:31 at org.junit.Assert.failNotEquals(Assert.java:834) > Jul 03 01:37:31 at org.junit.Assert.assertEquals(Assert.java:118) > Jul 03 01:37:31 at org.junit.Assert.assertEquals(Assert.java:144) > Jul 03 01:37:31 at > org.apache.flink.runtime.operators.coordination.OperatorEventSendingCheckpointITCase.runTest(OperatorEventSendingCheckpointITCase.java:254) > Jul 03 01:37:31 at > org.apache.flink.runtime.operators.coordination.OperatorEventSendingCheckpointITCase.testOperatorEventLostWithReaderFailure(OperatorEventSendingCheckpointITCase.java:143) > Jul 03 01:37:31 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Jul 03 01:37:31 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 03 01:37:31 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 03 01:37:31 at java.lang.reflect.Method.invoke(Method.java:498) > Jul 03 01:37:31 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > Jul 03 01:37:31 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 03 01:37:31 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > Jul 03 01:37:31 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 03 01:37:31 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 03 01:37:31 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > Jul 03 01:37:31 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jul 03 01:37:31 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > Jul 03 01:37:31 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > Jul 03 01:37:31 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)