[ https://issues.apache.org/jira/browse/FLINK-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156021#comment-15156021 ]
ASF GitHub Bot commented on FLINK-3443: --------------------------------------- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/1669#discussion_r53566669 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -1487,7 +1487,7 @@ class JobManager( } } - eg.fail(cause) + eg.cancel() --- End diff -- Yes, that would work during shutdown, but there will be a chance that a `fail` right before `cancelAndClearEverything` will still result in the restarting behaviour, because multiple calls to `fail` are ignored when the job status is `FAILING`. `cancel` makes sure that this does not happen, because cancellation "overwrites" failing behaviour. If we say that this is OK as a corner case, we can keep the `fail` on `cancelAndClearEverything` and wrap the Exception to suppress restarts in the common case. > JobManager cancel and clear everything fails jobs instead of cancelling > ----------------------------------------------------------------------- > > Key: FLINK-3443 > URL: https://issues.apache.org/jira/browse/FLINK-3443 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime > Reporter: Ufuk Celebi > Assignee: Ufuk Celebi > > When the job manager is shut down, it calls {{cancelAndClearEverything}}. > This method does not {{cancel}} the {{ExecutionGraph}} instances, but > {{fail}}s them, which can lead to {{ExecutionGraph}} restart. > I've noticed this in tests, where old graph got into a loop of restarts. > What I don't understand is why the futures etc. are not cancelled when the > executor service is shut down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)