yanghua commented on a change in pull request #8322: [FLINK-12364] Introduce a CheckpointFailureManager to centralized manage checkpoint failure URL: https://github.com/apache/flink/pull/8322#discussion_r284070033
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java ########## @@ -435,6 +439,12 @@ public boolean triggerCheckpoint(long timestamp, boolean isPeriodic) { triggerCheckpoint(timestamp, checkpointProperties, null, isPeriodic, false); return true; } catch (CheckpointException e) { + try { + long latestGeneratedCheckpointId = getCheckpointIdCounter().getAndIncrement(); Review comment: > BTW, I have one more important additional point, which is the `numUnsuccessfulCheckpointsTriggers` in checkpoint coordinator, which absolutely sounds like something that should now be moved into the failure manager, wdyt? After we provided `CheckpointFailureManager`, IMO the `numUnsuccessfulCheckpointsTriggers` is not valuable. Currently, it is incremented in the trigger lock of method `triggerCheckpoint`. The domain of trigger lock is the subset of the trigger phase which been `CheckpointFailureManager` understood. So the counting is not correct. And it is just for logging purpose. So I suggest we could remove it in this PR or in the third step (next step). If the logging is really necessary, we could do it again after we implemented the new counting logic based on checkpoint id sequence. What's your opinion? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services