rkhachatryan commented on PR #20091: URL: https://github.com/apache/flink/pull/20091#issuecomment-1173500743
Thanks for the PR @JesseAtSZ , I'm trying to understand why checkpoints are failing with `FINALIZE_CHECKPOINT_FAILURE` (which is ignored by `CheckpointFailureManager`) and not something like `IOException`. From the code, it might happen only in `CheckpointCoordinator` - when all the tasks have already acknowleged the checkpoint. That probably means that the job is stateless. Could you confirm that @JesseAtSZ ? If that's NOT the case then we probably should fix failure counting first. Another question is related to the TM - do we need a symmetric check there? (in `FsCheckpointStorageAccess.resolveCheckpointStorageLocation`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org