[ 
https://issues.apache.org/jira/browse/FLINK-22488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-22488:
-----------------------------------
    Labels: pull-request-available  (was: )

> KafkaSourceLegacyITCase.testOneToOneSources failed due to "OperatorEvent from 
> an OperatorCoordinator to a task was lost"
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-22488
>                 URL: https://issues.apache.org/jira/browse/FLINK-22488
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Dong Lin
>            Priority: Minor
>              Labels: pull-request-available
>
> According to [1], the test KafkaSourceLegacyITCase.testOneToOneSources failed 
> because it runs a streaming job (which uses KafkaSource) with 
> restartAttempts=1. In addition to the failover explicitly triggered by the 
> FailingIdentityMapper, the job additionally failed due to 
> "org.apache.flink.util.FlinkException: An OperatorEvent from an 
> OperatorCoordinator to a task was lost. Triggering task failover to ensure 
> consistency", which is unexpected by the test.
> Note that SubtaskGatewayImpl was updated by [2] on 4/14 which triggers task 
> failover if any OperatorEvent was lost. This could explain why those Kafka 
> tests start to fail due to the exception described above.
> In order to make this test stable, let's try to understand why there is such 
> a high chance of loosing OperatorEvent in the Azure test pipeline. And if we 
> could not avoid loosing OperatorEvent in the test pipeline, we probably need 
> to update the test to allow the pipeline being restarted arbitrary times (and 
> still be able to stop the test on the happy path).
> [1] 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17212&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5&l=6960
> [2] https://github.com/apache/flink/pull/15605



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to