rkhachatryan commented on code in PR #21923: URL: https://github.com/apache/flink/pull/21923#discussion_r1124206563
########## flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java: ########## @@ -733,8 +764,7 @@ void restoreInternal() throws Exception { // needed channelIOExecutor.shutdown(); - isRunning = true; - isRestoring = false; + taskState.status = TaskState.Status.RUNNING; Review Comment: Hey Panos, > the [invokable method](https://github.com/apache/flink/pull/21923/files#diff-0c5fe245445b932fa83fdaf5c4802dbe671b73e8d76f39058c6eaaaffd9639faL982-L983) that simultaneously marks the task as not Running and Canceled Yes, but after that, the Task Thread might update `isRunning`, but not `cancelled`. 1. Task Thread enters `restoreInternal` and passes `ensureNotCanceled` 2. Canceller Thread calls `invokable.cancel` and sets `isRunning = false`, `canceled = true` 3. Task Thread continues and sets `isRunning = true`, `isRestoring = false` 4. Legacy Source Thread gets interrupted and checks `isCancelled` - it's `true`, so `CancelTaskException` is thrown, which won't fail the job With this PR, all the flags are combined and updated unconditionally: 1. Task Thread enters `restoreInternal` and passes `ensureNotCanceled` 2. Canceller Thread calls `invokable.cancel` and sets `taskState.status = TaskState.Status.CANCELED` 3. Task Thread continues and sets `taskState.status = TaskState.Status.RUNNING` 4. Legacy Source Thread gets interrupted and checks `isCancelled`, i.e. `taskState.status == TaskState.Status.CANCELED`; which is `false`, so some other `Throwable` is thrown, which **does** fail the job -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org