[GitHub] [flink] rkhachatryan commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

via GitHub Fri, 03 Mar 2023 10:44:14 -0800


rkhachatryan commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1124883714



##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##########
@@ -733,8 +764,7 @@ void restoreInternal() throws Exception {
         // needed
         channelIOExecutor.shutdown();
 
-        isRunning = true;
-        isRestoring = false;
+        taskState.status = TaskState.Status.RUNNING;

Review Comment:
   > we don't need to keep that logic since if the job is canceled already we 
can easily ignore other exceptions which can happen since they can be actually 
because of cancelation
   
   Please note that Task cancellation is not the same as Job cancellation. For 
example, in case of some intermittent failure, the Job will be `RESTARTING`, 
but the tasks will be cancelled and then rescheduled. Ideally, we shouldn't be 
ignoring **any** exception during restart (just interruptions and 
cancellations). Otherwise, it might create a resource leak (job correctness 
won't be affected though).
   
   As for the `running` flag (which will report `true` on `master`; and `false` 
when CAS to `CANCELLED`);
   I briefly skimmed though its usages:
   - `Task.restoreAndInvoke` -> `StreamTask.invoke` -> 
`StreamTask.restoreInternal` - regression: it will allow 2nd `restoreInternal`
   - ongoing checkpoints will not be aborted by RPC - that's fine
   - local state will not be pruned by CHK abort RPC - regression (but 
**likely** will be pruned on the next checkpoint)
   - ...
   
   The above can probably be fixed (in this PR), but my point is that it's hard 
to check all the combinations and their usages.
   If that's true, I'd stick closer to the original behavior by having states 
**unless** we are sure that they are impossible. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rkhachatryan commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

Reply via email to