Hi Jinzhong,

Sorry to answer you just now. We have switched from incremental checkpoint
to non-incremental checkpoint before, I think one of the reasons is the
difficulty to handle properly the clean up of checkpoints on S3.  But with
the flink operator's periodic savepoint it may change. I'll re-test it
then, thanks for the help!

Best,
Yang

On Wed, 8 Nov 2023 at 06:51, Jinzhong Li <lijinzhong2...@gmail.com> wrote:

> Hi Yang,
>
> I think there is no configuration option available that allow users to
> disable checkpoint file cleanup at runtime.
>
> Does your flink application use incremental checkpoint?
> 1) If yes, i think leveraging S3's lifecycle management to clean
> checkpoint files is not safe, because it may accidentally delete a file
> which is still in use, although the probability is small.
> 2) If no, you can try to enable incremental checkpoint and increase the
> checkpoint interval to reduce the S3 traffic.
>
> Yang LI <yang.hunter...@gmail.com> 于2023年11月8日周三 04:58写道:
>
>> Hi Martijn,
>>
>>
>> We're currently utilizing flink-s3-fs-presto. After reviewing the
>> flink-s3-fs-hadoop source code, I believe we would encounter similar issues
>> with it as well.
>>
>> When we say, 'The purpose of a checkpoint, in principle, is that Flink
>> manages its lifecycle,' I think it implies that the automatic cleanup of
>> old checkpoints is an integral part of Flink's lifecycle management.
>> However, is there a configuration option available that allows us to
>> disable this automatic cleanup? We're considering leveraging AWS S3's
>> lifecycle management capabilities to handle this aspect instead of relying
>> on Flink.
>>
>> Best,
>> Yang
>>
>> On Tue, 7 Nov 2023 at 18:44, Martijn Visser <martijnvis...@apache.org>
>> wrote:
>>
>>> Ah, I actually misread checkpoint and savepoints, sorry. The purpose
>>> of a checkpoint in principle is that Flink manages its lifecycle.
>>> Which S3 interface are you using for the checkpoint storage?
>>>
>>> On Tue, Nov 7, 2023 at 6:39 PM Martijn Visser <martijnvis...@apache.org>
>>> wrote:
>>> >
>>> > Hi Yang,
>>> >
>>> > If you use the NO_CLAIM mode, Flink will not assume ownership of a
>>> > snapshot and leave it up to the user to delete them. See the blog [1]
>>> > for more details.
>>> >
>>> > Best regards,
>>> >
>>> > Martijn
>>> >
>>> > [1]
>>> https://flink.apache.org/2022/05/06/improvements-to-flink-operations-snapshots-ownership-and-savepoint-formats/#no_claim-default-mode
>>> >
>>> > On Tue, Nov 7, 2023 at 5:29 PM Junrui Lee <jrlee....@gmail.com> wrote:
>>> > >
>>> > > Hi Yang,
>>> > >
>>> > >
>>> > > You can try configuring
>>> "execution.checkpointing.externalized-checkpoint-retention:
>>> RETAIN_ON_CANCELLATION"[1] and increasing the value of
>>> "state.checkpoints.num-retained"[2] to retain more checkpoints.
>>> > >
>>> > >
>>> > > Here are the official documentation links for more details:
>>> > >
>>> > > [1]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#execution-checkpointing-externalized-checkpoint-retention
>>> > >
>>> > > [2]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#state-checkpoints-num-retained
>>> > >
>>> > >
>>> > > Best,
>>> > >
>>> > > Junrui
>>> > >
>>> > >
>>> > > Yang LI <yang.hunter...@gmail.com> 于2023年11月7日周二 22:02写道:
>>> > >>
>>> > >> Dear Flink Community,
>>> > >>
>>> > >> In our Flink application, we persist checkpoints to AWS S3.
>>> Recently, during periods of high job parallelism and traffic, we've
>>> experienced checkpoint failures. Upon investigating, it appears these may
>>> be related to S3 delete object requests interrupting checkpoint re-uploads,
>>> as evidenced by numerous InterruptedExceptions.
>>> > >>
>>> > >> We aim to explore options for disabling the deletion of stale
>>> checkpoints. Despite consulting the Flink configuration documentation and
>>> conducting various tests, the appropriate setting to prevent old checkpoint
>>> cleanup remains elusive.
>>> > >>
>>> > >> Could you advise if there's a method to disable the automatic
>>> cleanup of old Flink checkpoints?
>>> > >>
>>> > >> Best,
>>> > >> Yang
>>>
>>

Reply via email to