Re: Discard checkpoint files through a single recursive call

Piotr Nowojski Fri, 18 Jun 2021 02:34:19 -0700

Hi,

Unfortunately at the moment I think there are no plans to push for this. I
would suggest you to bump/cast a vote on
https://issues.apache.org/jira/browse/FLINK-13856 in order to allows us
more accurately prioritise efforts.


Best,
Piotrek

śr., 16 cze 2021 o 05:46 Jiahui Jiang <qzhzm173...@hotmail.com> napisał(a):

> Hello Yun and Guowei,
>
> Thanks for the context! Looks like the plan is to have a Flink config flag
> to enable recursive deletion? Is there any plan to push through this PR in
> the next release? https://github.com/apache/flink/pull/9602
>
>
> Thank you so much!
> Jiahui
> ------------------------------
> *From:* Yun Tang <myas...@live.com>
> *Sent:* Tuesday, June 15, 2021 10:27 PM
> *To:* Guowei Ma <guowei....@gmail.com>; Jiahui Jiang <
> qzhzm173...@hotmail.com>
> *Cc:* user@flink.apache.org <user@flink.apache.org>
> *Subject:* Re: Discard checkpoint files through a single recursive call
>
> Hi Jiang,
>
> Please take a look at FLINK-17860 and FLINK-13856 for previous discussion
> of this problem.
>
> [1] https://issues.apache.org/jira/browse/FLINK-17860
> [2] https://issues.apache.org/jira/browse/FLINK-13856
>
> Best
> Yun Tang
>
> ------------------------------
> *From:* Guowei Ma <guowei....@gmail.com>
> *Sent:* Wednesday, June 16, 2021 8:40
> *To:* Jiahui Jiang <qzhzm173...@hotmail.com>
> *Cc:* user@flink.apache.org <user@flink.apache.org>
> *Subject:* Re: Discard checkpoint files through a single recursive call
>
> hi, Jiang
>
> I am afraid of misunderstanding what you mean, so can you elaborate on how
> you want to change it? For example, which interface or class do you want to
> add a method to?
> Although I am not a state expert, as far as I know, due to incremental
> checkpoints, when CompleteCheckpoint is discarding, it is necessary to call
> the discardState method of each State.
>
> Best,
> Guowei
>
>
> On Tue, Jun 15, 2021 at 7:37 AM Jiahui Jiang <qzhzm173...@hotmail.com>
> wrote:
>
> Hello Flink!
>
> We are building an infrastructure where we implement our own
> CompletedCheckpointStore. The read and write to the external storage
> location of these checkpoints are through HTTP calls to an external service.
>
> Recently we noticed some checkpoint file cleanup performance issue when
> the job writes out a very high number of checkpoint files per checkpoint.
> (In our case we had a few hundreds of operators and ran with 16
> parallelism)
> During checkpoint state discard phase, since the implementation in
> CompletedCheckpoint discards the state files one by one, we are seeing a
> very high number of remote calls. Sometimes the deletion fails to catch up
> with the checkpoint progress.
>
> Given the interface we are given to configure the external storage
> location for checkpoints is always a `target directory`. Would it be
> reasonable to expose an implementation of discard() that directly calls
> disposeStorageLocation with recursive set to true, without iterating over
> each individual files first? Is there any blockers for that?
>
> Thank you!
>
>
> links
>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java#L240
>
> https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsCompletedCheckpointStorageLocation.java#L70
>
>

Re: Discard checkpoint files through a single recursive call

Reply via email to