Hi,
We are on Flink 1.20/Java17 running in a k8s environment, with checkpoints
enabled on S3 and the following checkpoint options:
execution.checkpointing.dir: s3://flink-application/checkpoints
execution.checkpointing.externalized-checkpoint-retention:
DELETE_ON_CANCELLATION
execution.checkpointing.interval: 150000 ms
execution.checkpointing.min-pause: 30000 ms
execution.checkpointing.mode: EXACTLY_ONCE
execution.checkpointing.savepoint-dir: s3://flink-application/savepoints
execution.checkpointing.timeout: 10 min
execution.checkpointing.tolerable-failed-checkpoints: "3"
We have been through quite a few flink application restarts due to
streaming failure for various reasons (mostly kafka related), but also
flink application changes. The Flink application then tends to be resumed
from savepoints, but we noticed an increasing number of checkpoints are
left behind. Is there a built-in way of cleaning these obsolete checkpoints?
I suppose what we do not really understand is the condition(s) under which
Flink may not clean up checkpoints. Can someone explain?
Thanks
JM