[ https://issues.apache.org/jira/browse/FLINK-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann updated FLINK-23871: ---------------------------------- Fix Version/s: (was: 1.12.6) > Dispatcher should handle finishing job exception when recover > ------------------------------------------------------------- > > Key: FLINK-23871 > URL: https://issues.apache.org/jira/browse/FLINK-23871 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.14.0, 1.12.5, 1.13.2 > Reporter: Aitozi > Assignee: Aitozi > Priority: Major > Labels: pull-request-available > Fix For: 1.14.0, 1.13.3 > > > The exception during run recovery job will trigger fatal error which is > introduced in https://issues.apache.org/jira/browse/FLINK-9097. If a job > have reached a finished status. But crash at clean up phase or any other post > phase. When recover job, it may recover a job in > RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the > dispatcher fatal again. > I think we should deal with the RunningJobsRegistry.JobSchedulingStatus.DONE > with special exception like JobFinishingException, which represents the > job/master crashed in job finishing phase. And only do the clean up work for > this exception -- This message was sent by Atlassian Jira (v8.3.4#803005)