[ https://issues.apache.org/jira/browse/FLINK-23862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-23862: ----------------------------------- Labels: pull-request-available (was: ) > Race condition while cancelling task during initialization > ---------------------------------------------------------- > > Key: FLINK-23862 > URL: https://issues.apache.org/jira/browse/FLINK-23862 > Project: Flink > Issue Type: Bug > Components: Runtime / Task > Affects Versions: 1.14.0 > Reporter: Roman Khachatryan > Assignee: Roman Khachatryan > Priority: Blocker > Labels: pull-request-available > Fix For: 1.14.0 > > > While debugging the recent failures in FLINK-22889, I see that sometimes the > operator chain is not closed if the task is cancelled while it's being > initialized. > > The reason is that on restore(), cleanUpInvoke() is only closed if there was > an exception, including CancelTaskException. > The latter is only thrown if StreamTask.canceled is set, i.e. TaskCanceler > has called StreamTask.cancel(). > > So if StreamTask is cancelled in between restore and normal invoke then it > may not close the operator chain and not do other cleanup. > > One solution is to make StreamTask.cleanup visible to and called from Task. > > cc: [~akalashnikov], [~pnowojski] -- This message was sent by Atlassian Jira (v8.3.4#803005)