[jira] [Updated] (FLINK-12183) Job Cluster doesn't release resources after cancel a running job in per-job Yarn mode

ASF GitHub Bot (JIRA) Sat, 13 Apr 2019 03:28:51 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated FLINK-12183:
-----------------------------------
    Labels: pull-request-available  (was: )

> Job Cluster doesn't release resources after cancel a running job in per-job 
> Yarn mode
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-12183
>                 URL: https://issues.apache.org/jira/browse/FLINK-12183
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / REST
>    Affects Versions: 1.6.4, 1.7.2, 1.8.0
>            Reporter: Yumeng Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> The per-job Yarn cluster doesn't releases resources after cancel a running 
> job if the job restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will 
> exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. 
> It's hard to find that because here it only catches IOException. In 
> SubtaskExecutionAttemptDetailsHandler and  
> SubtaskExecutionAttemptAccumulatorsHandler, when calling 
> archiveJsonWithPath() method, it will construct some json information about 
> prior execution attempts but the index is from 0 which might be dropped index 
> for the for loop.  In default, it will return null when trying to get the 
> prior execution attempt (AccessExecution attempt = 
> subtask.getPriorExecutionAttempt(x)).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12183) Job Cluster doesn't release resources after cancel a running job in per-job Yarn mode

Reply via email to