Hi Yang,

You can try configuring
"execution.checkpointing.externalized-checkpoint-retention:
RETAIN_ON_CANCELLATION"[1] and increasing the value of
"state.checkpoints.num-retained"[2] to retain more checkpoints.


Here are the official documentation links for more details:

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#execution-checkpointing-externalized-checkpoint-retention

[2]
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#state-checkpoints-num-retained


Best,

Junrui

Yang LI <yang.hunter...@gmail.com> 于2023年11月7日周二 22:02写道:

> Dear Flink Community,
>
> In our Flink application, we persist checkpoints to AWS S3. Recently,
> during periods of high job parallelism and traffic, we've experienced
> checkpoint failures. Upon investigating, it appears these may be related to
> S3 delete object requests interrupting checkpoint re-uploads, as evidenced
> by numerous InterruptedExceptions.
>
> We aim to explore options for disabling the deletion of stale checkpoints.
> Despite consulting the Flink configuration documentation and conducting
> various tests, the appropriate setting to prevent old checkpoint cleanup
> remains elusive.
>
> Could you advise if there's a method to disable the automatic cleanup of
> old Flink checkpoints?
>
> Best,
> Yang
>

Reply via email to