[jira] [Updated] (FLINK-23871) Dispatcher should handle finishing job exception when recover

Aitozi (Jira) Thu, 19 Aug 2021 02:44:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aitozi updated FLINK-23871:
---------------------------
    Description: 
The exception during run recovery job will trigger fatal error which is 
introduced in https://issues.apache.org/jira/browse/FLINK-9097.  If a job have 
reached a finished status. But crash at clean up phase or any other post phase. 
When recover job, it may recover a job in 
RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the 
dispatcher fatal again. 

I think we should deal with the  RunningJobsRegistry.JobSchedulingStatus.DONE 
with special exception like JobFinishingException, which represents the 
job/master crashed in job finishing phase. And only do the clean up work for 
this exception

  was:
The exception during run recovery job will trigger fatal error which is 
introduced in https://issues.apache.org/jira/browse/FLINK-9097. But if a job 
have reached a finished status. But crash at cleap up phase or any other post 
phase. When recover job, it may recover a job in 
RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the 
dispatcher fatal again. 

I think we should deal with the  RunningJobsRegistry.JobSchedulingStatus.DONE 
with special exception like JobFinishingException, which represents the 
job/master crashed in job finishing phase. And only do the clean up work for 
this exception


> Dispatcher should handle finishing job exception when recover
> -------------------------------------------------------------
>
>                 Key: FLINK-23871
>                 URL: https://issues.apache.org/jira/browse/FLINK-23871
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.2
>            Reporter: Aitozi
>            Priority: Major
>
> The exception during run recovery job will trigger fatal error which is 
> introduced in https://issues.apache.org/jira/browse/FLINK-9097.  If a job 
> have reached a finished status. But crash at clean up phase or any other post 
> phase. When recover job, it may recover a job in 
> RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the 
> dispatcher fatal again. 
> I think we should deal with the  RunningJobsRegistry.JobSchedulingStatus.DONE 
> with special exception like JobFinishingException, which represents the 
> job/master crashed in job finishing phase. And only do the clean up work for 
> this exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-23871) Dispatcher should handle finishing job exception when recover

Reply via email to