[jira] [Created] (FLINK-32316) Duplicated announceCombinedWatermark task maybe scheduled if jobmanager failover

Cai Liuyang (Jira) Mon, 12 Jun 2023 06:04:00 -0700

Cai Liuyang created FLINK-32316:
-----------------------------------

             Summary: Duplicated announceCombinedWatermark task maybe scheduled 
if jobmanager failover
                 Key: FLINK-32316
                 URL: https://issues.apache.org/jira/browse/FLINK-32316
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.16.0
            Reporter: Cai Liuyang



When we try SourceAlignment feature, we found there will be a duplicated 
announceCombinedWatermark task will be scheduled after JobManager failover, and 
auto recover job from checkpoint.

The reason i think is  we should schedule announceCombinedWatermark task during 
SourceCoordinator::start function not in SourceCoordinator construct function 
(see  
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L149]
 ), because when jobManager encounter failover and auto recover job, it will 
create SourceCoordinator twice:
 * The first one is  when JobMaster is create it will create the 
DefaultExecutionGraph.
 * The Second one is JobMaster call restoreLatestCheckpointedStateInternal 
method, which will be reset old sourceCoordinator and initialize a new one, but 
because the first sourceCoordinator is not started(SourceCoordinator will be 
started before SchedulerBase::startScheduling, so the first SourceCoordinator 
will not be fully closed).

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32316) Duplicated announceCombinedWatermark task maybe scheduled if jobmanager failover

Reply via email to