Hi Yang,
You can try configuring "execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION"[1] and increasing the value of "state.checkpoints.num-retained"[2] to retain more checkpoints. Here are the official documentation links for more details: [1] https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#execution-checkpointing-externalized-checkpoint-retention [2] https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#state-checkpoints-num-retained Best, Junrui Yang LI <yang.hunter...@gmail.com> 于2023年11月7日周二 22:02写道: > Dear Flink Community, > > In our Flink application, we persist checkpoints to AWS S3. Recently, > during periods of high job parallelism and traffic, we've experienced > checkpoint failures. Upon investigating, it appears these may be related to > S3 delete object requests interrupting checkpoint re-uploads, as evidenced > by numerous InterruptedExceptions. > > We aim to explore options for disabling the deletion of stale checkpoints. > Despite consulting the Flink configuration documentation and conducting > various tests, the appropriate setting to prevent old checkpoint cleanup > remains elusive. > > Could you advise if there's a method to disable the automatic cleanup of > old Flink checkpoints? > > Best, > Yang >