Zhijiang Wang created FLINK-5501:
------------------------------------

             Summary: Determine whether the job starts from last JobManager 
failure
                 Key: FLINK-5501
                 URL: https://issues.apache.org/jira/browse/FLINK-5501
             Project: Flink
          Issue Type: Sub-task
          Components: JobManager
            Reporter: Zhijiang Wang


When the {{JobManagerRunner}} grants leadership, it should check whether the 
current job is already running or not. If the job is running, the 
{{JobManager}} should reconcile itself (enter RECONCILING state) and waits for 
the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} can 
schedule the {{ExecutionGraph}} in common way.

The {{RunningJobsRegistry}} can provide the way to check the job running 
status, but we should expand the current interface and fix the related process 
to support this function.

1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} 
granting leadership at the first time.
2. If the job finishes, the job status will be set FINISHED by 
{{RunningJobsRegistry}} and the status will be deleted before exit. 
3. If the {{JobManager}} fails, the job status will be still in RUNNING, so 
when the {{JobManagerRunner}} (the previous or new one) grants leadership 
again, it checks the job status and enters {{RECONCILING}} state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to