Cai Liuyang created FLINK-32316: ----------------------------------- Summary: Duplicated announceCombinedWatermark task maybe scheduled if jobmanager failover Key: FLINK-32316 URL: https://issues.apache.org/jira/browse/FLINK-32316 Project: Flink Issue Type: Bug Affects Versions: 1.16.0 Reporter: Cai Liuyang
When we try SourceAlignment feature, we found there will be a duplicated announceCombinedWatermark task will be scheduled after JobManager failover, and auto recover job from checkpoint. The reason i think is we should schedule announceCombinedWatermark task during SourceCoordinator::start function not in SourceCoordinator construct function (see [https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L149] ), because when jobManager encounter failover and auto recover job, it will create SourceCoordinator twice: * The first one is when JobMaster is create it will create the DefaultExecutionGraph. * The Second one is JobMaster call restoreLatestCheckpointedStateInternal method, which will be reset old sourceCoordinator and initialize a new one, but because the first sourceCoordinator is not started(SourceCoordinator will be started before SchedulerBase::startScheduling, so the first SourceCoordinator will not be fully closed). -- This message was sent by Atlassian Jira (v8.20.10#820010)