Q: How to best configure checkpoint to ensure they do not fill-up the storage?

Jean-Marc Paulin Tue, 31 Dec 2024 03:58:51 -0800

Hi,

We are on Flink 1.20/Java17 running in a k8s environment, with checkpoints
enabled on S3 and the following checkpoint options:


    execution.checkpointing.dir: s3://flink-application/checkpoints
    execution.checkpointing.externalized-checkpoint-retention:
DELETE_ON_CANCELLATION
    execution.checkpointing.interval: 150000 ms
    execution.checkpointing.min-pause: 30000 ms
    execution.checkpointing.mode: EXACTLY_ONCE
    execution.checkpointing.savepoint-dir: s3://flink-application/savepoints
    execution.checkpointing.timeout: 10 min
    execution.checkpointing.tolerable-failed-checkpoints: "3"

We have been through quite a few flink application restarts due to
streaming failure for various reasons (mostly kafka related), but also
flink application changes. The Flink application then tends to be resumed
from savepoints, but we noticed an increasing number of checkpoints are
left behind. Is there a built-in way of cleaning these obsolete checkpoints?

I suppose what we do not really understand is the condition(s) under which
Flink may not clean up checkpoints. Can someone explain?

Thanks

JM

Q: How to best configure checkpoint to ensure they do not fill-up the storage?

Reply via email to