[ https://issues.apache.org/jira/browse/FLINK-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814052#comment-16814052 ]
TisonKun commented on FLINK-11813: ---------------------------------- [~zhuzh] It looks like an ingenious perspective. With this approach we need not to introduce a notify mechanism or clean up stage. I'd like to clarify *how* and when {{JobSchedulingStatus}} published(modified). 1. {{NONE}} Initially, there is no znode with path and data {{running_jobs_registry/job_id(data: JobSchedulingStatus)}} and thus we recognized the job in status {{NONE}} (I'd prefer refer it as {{INVALID}}. WDYT?) 2. {{PENDING}} when a job submitted to Dispatcher, *the Dispatcher* published the job scheduling status as {{PENDING}} 3. {{RUNNING}} when a job start to schedule(i.e., a corresponding JobMaster started), *the Dispatcher* published the job scheduling status as {{RUNNING}} 4. Again {{NONE}} when a job globally terminated, *the Dispatcher* remove znode {{running_jobs_registry/job_id(data: JobSchedulingStatus)}} as publishing its status to {{NONE}} The point is that we could a bit further when consider interfaces like {{(un)registerJob}}. Let only Dispatcher publishes the job scheduling status. Currently a RUNNING status is published by JobManagerRunner, which cause unclear of responsibility. With only Dispatcher publishes status, a JM should only read from jobs registry and decide whether and how to start the job. No extra mechanism need to be involved. > Standby per job mode Dispatchers don't know job's JobSchedulingStatus > --------------------------------------------------------------------- > > Key: FLINK-11813 > URL: https://issues.apache.org/jira/browse/FLINK-11813 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Reporter: Till Rohrmann > Priority: Major > > At the moment, it can happen that standby {{Dispatchers}} in per job mode > will restart a terminated job after they gained leadership. The problem is > that we currently clear the {{RunningJobsRegistry}} once a job has reached a > globally terminal state. After the leading {{Dispatcher}} terminates, a > standby {{Dispatcher}} will gain leadership. Without having the information > from the {{RunningJobsRegistry}} it cannot tell whether the job has been > executed or whether the {{Dispatcher}} needs to re-execute the job. At the > moment, the {{Dispatcher}} will assume that there was a fault and hence > re-execute the job. This can lead to duplicate results. > I think we need some way to tell standby {{Dispatchers}} that a certain job > has been successfully executed. One trivial solution could be to not clean up > the {{RunningJobsRegistry}} but then we will clutter ZooKeeper. -- This message was sent by Atlassian JIRA (v7.6.3#76005)