FW to user ML.

Hi Jean-Marc,

Could you elaborate more about how you noticed an increasing number of
checkpoints are left behind? Is the number of subdirectories under
s3://flink-application/checkpoints increasing? And have you set the state
TTL?


Best,
Zakelly

On Thu, Jan 2, 2025 at 12:19 PM Zakelly Lan <zakelly....@gmail.com> wrote:

> Hi Jean-Marc,
>
> Could you elaborate more about how you noticed an increasing number of
> checkpoints are left behind? Is the number of subdirectories under
> s3://flink-application/checkpoints increasing? And have you set the state
> TTL?
>
>
> Best,
> Zakelly
>
> On Tue, Dec 31, 2024 at 7:58 PM Jean-Marc Paulin <jm.pau...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We are on Flink 1.20/Java17 running in a k8s environment, with
>> checkpoints enabled on S3 and the following checkpoint options:
>>
>>     execution.checkpointing.dir: s3://flink-application/checkpoints
>>     execution.checkpointing.externalized-checkpoint-retention:
>> DELETE_ON_CANCELLATION
>>     execution.checkpointing.interval: 150000 ms
>>     execution.checkpointing.min-pause: 30000 ms
>>     execution.checkpointing.mode: EXACTLY_ONCE
>>     execution.checkpointing.savepoint-dir:
>> s3://flink-application/savepoints
>>     execution.checkpointing.timeout: 10 min
>>     execution.checkpointing.tolerable-failed-checkpoints: "3"
>>
>> We have been through quite a few flink application restarts due to
>> streaming failure for various reasons (mostly kafka related), but also
>> flink application changes. The Flink application then tends to be resumed
>> from savepoints, but we noticed an increasing number of checkpoints are
>> left behind. Is there a built-in way of cleaning these obsolete checkpoints?
>>
>> I suppose what we do not really understand is the condition(s) under
>> which Flink may not clean up checkpoints. Can someone explain?
>>
>> Thanks
>>
>> JM
>>
>

Reply via email to