Aitozi created FLINK-23871:
------------------------------
Summary: Dispatcher should handle finishing job exception when
recover
Key: FLINK-23871
URL: https://issues.apache.org/jira/browse/FLINK-23871
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.13.2
Reporter: Aitozi
The exception during run recovery job will trigger fatal error which is
introduced in https://issues.apache.org/jira/browse/FLINK-9097. But if a job
have reached a finished status. But crash at cleap up phase or any other post
phase. When recover job, it may recover a job in
RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the
dispatcher fatal again.
I think we should deal with the RunningJobsRegistry.JobSchedulingStatus.DONE
with special exception like JobFinishingException, which represents the
job/master crashed in job finishing phase. And only do the clean up work for
this exception
--
This message was sent by Atlassian Jira
(v8.3.4#803005)