[ 
https://issues.apache.org/jira/browse/FLINK-35722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860782#comment-17860782
 ] 

Matthias Pohl commented on FLINK-35722:
---------------------------------------

We touched checkpointing as part of FLINK-35552. Only the 
CheckpointStatsTracker is modified and the main changes happen in the 
{{AdaptiveScheduler}}, though. The test failure happened with the 
{{DefaultScheduler}} being utilized.

Moving the {{CheckpointStatsTracker}] out of the 
{{DefaultExecutionGraphBuilder}} into the {{Scheduler}} should have this kind 
of impact. I was neither able to reproduce the issue locally. 

That's why I am suspecting the test stability being unrelated to the 
FLINK-35552 change.

> CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint 
> fails because of missed operator event
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-35722
>                 URL: https://issues.apache.org/jira/browse/FLINK-35722
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 2.0.0, 1.20.0
>            Reporter: Matthias Pohl
>            Priority: Major
>              Labels: test-stability
>         Attachments: FLINK-35552.testCheckpoint.log, 
> logs-ci-test_ci_core-1719524892.zip
>
>
> A test instability in 
> {{CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint}}
>  was observed where an expected {{OperatorEvent}} was missed:
> {code:java}
> Test 
> org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint[testCheckpoint()]
>  failed with:
> java.lang.AssertionError:
> Expecting actual:
>   [0,
>     1,
>     3,
>     4,
> [...]
>     98,
>     99]
> to contain exactly (and in same order):
>   [0,
>     1,
>     2,
>     3,
>     4,
> [...]
> but could not find the following elements:
>   [2]        at 
> org.apache.flink.runtime.operators.coordination.CoordinatorEventsExactlyOnceITCase.checkListContainsSequence(CoordinatorEventsExactlyOnceITCase.java:175)
>         at 
> org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.executeAndVerifyResults(CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.java:178)
>         at 
> org.apache.flink.streaming.runtime.tasks.CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint(CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.java:124)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code}
> The [build 
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60530&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8]
>  happened on commit 
> [2e853ce39a|https://github.com/flink-ci/flink/commit/2e853ce39aa2db8212402de3dcc0f049397887fd]
>  for FLINK-35552.
> I attached the logs for further relevant build artifact and the extract logs 
> for this specific test failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to