tillrohrmann commented on a change in pull request #14683: URL: https://github.com/apache/flink/pull/14683#discussion_r560143586
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointsCleaner.java ########## @@ -62,8 +62,15 @@ public void cleanCheckpoint( } } } finally { - numberOfCheckpointsToClean.decrementAndGet(); - postCleanAction.run(); + try { + numberOfCheckpointsToClean.decrementAndGet(); + postCleanAction.run(); + } catch (Exception e) { + LOG.error( + "Error while cleaning up checkpoint {}", + checkpoint.getCheckpointID(), + e); Review comment: The calling thread is the thread calling `CheckpointsCleaner.cleanCheckpoint`, right? Then this call will enqueue the clean up action and eventually release the `checkpoint.lock`. That's when the clean up action can complete. I think as long as we have made sure that the `CheckpointsCleaner` owns the `Checkpoint` there should be no other thread accessing the `checkpoint.lock`. At the end of the day it probably boils down to a proper lifecycle management of the involved objects and deciding who owns what. But maybe I am overlooking something here and you have a concrete thread interleaving in mind which I don't see at the moment. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org