[ 
https://issues.apache.org/jira/browse/FLINK-24162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410707#comment-17410707
 ] 

Yun Gao commented on FLINK-24162:
---------------------------------

I should roughly get the reason: it is due to that the test verifies for each 
operator, it should only received max watermark and EndOfData once for all the 
attempts. However, this is in fact not ensured:  if a task get finished first, 
then get restarted due to some downstream tasks failed, it would still re-emit 
max watermark and EndOfData. This is required for the downstream tasks to align.

This only happen with adaptive scheduler since now the test is in fact using 
region failover, with the default scheduler the finish task would not restart 
in failover. Sorry I do not fully got why with adaptive scheduler the whole job 
is restarted, but from the log this should indeed happen.

I'll open a PR to fix the issue~

> PartiallyFinishedSourcesITCase fails due to assertion error in 
> DrainingValidator.validateOperatorLifecycle
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24162
>                 URL: https://issues.apache.org/jira/browse/FLINK-24162
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.14.0, 1.15.0
>            Reporter: Xintong Song
>            Priority: Blocker
>              Labels: test-stability
>             Fix For: 1.14.0, 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23526&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=4639
> {code}
> Sep 03 23:17:11 [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 19.233 s <<< FAILURE! - in 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase
> Sep 03 23:17:11 [ERROR] test[simple graph SINGLE_SUBTASK, failover: true]  
> Time elapsed: 2.27 s  <<< FAILURE!
> Sep 03 23:17:11 java.lang.AssertionError
> Sep 03 23:17:11       at org.junit.Assert.fail(Assert.java:87)
> Sep 03 23:17:11       at org.junit.Assert.assertTrue(Assert.java:42)
> Sep 03 23:17:11       at org.junit.Assert.assertFalse(Assert.java:65)
> Sep 03 23:17:11       at org.junit.Assert.assertFalse(Assert.java:75)
> Sep 03 23:17:11       at 
> org.apache.flink.runtime.operators.lifecycle.validation.DrainingValidator.validateOperatorLifecycle(DrainingValidator.java:56)
> Sep 03 23:17:11       at 
> org.apache.flink.runtime.operators.lifecycle.validation.TestOperatorLifecycleValidator.lambda$checkOperatorsLifecycle$1(TestOperatorLifecycleValidator.java:52)
> Sep 03 23:17:11       at java.util.HashMap.forEach(HashMap.java:1289)
> Sep 03 23:17:11       at 
> org.apache.flink.runtime.operators.lifecycle.validation.TestOperatorLifecycleValidator.checkOperatorsLifecycle(TestOperatorLifecycleValidator.java:47)
> Sep 03 23:17:11       at 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:94)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to