[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-12183: ----------------------------------- Labels: pull-request-available (was: ) > Job Cluster doesn't release resources after cancel a running job in per-job > Yarn mode > ------------------------------------------------------------------------------------- > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST > Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Reporter: Yumeng Zhang > Priority: Major > Labels: pull-request-available > > The per-job Yarn cluster doesn't releases resources after cancel a running > job if the job restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will > exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. > It's hard to find that because here it only catches IOException. In > SubtaskExecutionAttemptDetailsHandler and > SubtaskExecutionAttemptAccumulatorsHandler, when calling > archiveJsonWithPath() method, it will construct some json information about > prior execution attempts but the index is from 0 which might be dropped index > for the for loop. In default, it will return null when trying to get the > prior execution attempt (AccessExecution attempt = > subtask.getPriorExecutionAttempt(x)). -- This message was sent by Atlassian JIRA (v7.6.3#76005)