Aitozi created FLINK-23871: ------------------------------ Summary: Dispatcher should handle finishing job exception when recover Key: FLINK-23871 URL: https://issues.apache.org/jira/browse/FLINK-23871 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.13.2 Reporter: Aitozi
The exception during run recovery job will trigger fatal error which is introduced in https://issues.apache.org/jira/browse/FLINK-9097. But if a job have reached a finished status. But crash at cleap up phase or any other post phase. When recover job, it may recover a job in RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the dispatcher fatal again. I think we should deal with the RunningJobsRegistry.JobSchedulingStatus.DONE with special exception like JobFinishingException, which represents the job/master crashed in job finishing phase. And only do the clean up work for this exception -- This message was sent by Atlassian Jira (v8.3.4#803005)