pnowojski commented on code in PR #19993:
URL: https://github.com/apache/flink/pull/19993#discussion_r899899066


##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequest.java:
##########
@@ -109,6 +112,9 @@ static ChannelStateWriteRequest buildFutureWriteRequest(
                     }
                 },
                 throwable -> {
+                    if (!dataFuture.isDone()) {
+                        return;
+                    }

Review Comment:
   > Another question I would ask is why this can even result in a TM crash; 
shouldn't the waiting be interrupted instead.
   
   Why should it be interrupted? We are only using interrupts to wake up user 
code or 3rd party libraries. Our own code should be able to shutdown cleanly 
without interruptions. We even explicitly disallow SIGINTs during `StreamTask` 
cleanup (`StreamTask#disableInterruptOnCancel`), once task thread exists from 
user code as otherwise this could lead to resource leaks. If we can not clean 
up resources, we have to relay on the `TaskCancelerWatchDog` that will fail 
over whole TM after a time out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to