Zhijiang Wang created FLINK-5501: ------------------------------------ Summary: Determine whether the job starts from last JobManager failure Key: FLINK-5501 URL: https://issues.apache.org/jira/browse/FLINK-5501 Project: Flink Issue Type: Sub-task Components: JobManager Reporter: Zhijiang Wang
When the {{JobManagerRunner}} grants leadership, it should check whether the current job is already running or not. If the job is running, the {{JobManager}} should reconcile itself (enter RECONCILING state) and waits for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} can schedule the {{ExecutionGraph}} in common way. The {{RunningJobsRegistry}} can provide the way to check the job running status, but we should expand the current interface and fix the related process to support this function. 1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} granting leadership at the first time. 2. If the job finishes, the job status will be set FINISHED by {{RunningJobsRegistry}} and the status will be deleted before exit. 3. If the {{JobManager}} fails, the job status will be still in RUNNING, so when the {{JobManagerRunner}} (the previous or new one) grants leadership again, it checks the job status and enters {{RECONCILING}} state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)