Re: Clarification on checkpoint cleanup with RETAIN_ON_CANCELLATION

Hangxiang Yu Mon, 12 Dec 2022 01:31:46 -0800

Hi Alexis.
IIUC, by default, the job id of the new job should be different if you
restore from a stopped job ? Whether to cleanup is related to the savepoint
restore mode.
Just in the case of failover, the job id should not change, and everything
in the checkpoint dir will be claimed as you said.


> And a related question for a slightly different scenario, if I
use ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION and trigger a
stop-job-with-savepoint, does that trigger checkpoint deletion?
In this case, the checkpoint will be cleaned and not retained and the
savepoint will remain. So you still could use savepoint to restore.

On Mon, Dec 5, 2022 at 6:33 PM Alexis Sarda-Espinosa <
[email protected]> wrote:

> Hello,
>
> I have a doubt about a very particular scenario with this configuration:
>
> - Flink HA enabled (Kubernetes).
> - ExternalizedCheckpointCleanup set to RETAIN_ON_CANCELLATION.
> - Savepoint restore mode left as default NO_CLAIM.
>
> During an upgrade, a stop-job-with-savepoint is triggered, and then that
> savepoint is used to start the upgraded job. Based on what I see, since HA
> is enabled, the job ID doesn't change. Additionally, I understand the first
> checkpoint after restoration will be a full one so that there's no
> dependency on the used savepoint. However, since the job ID didn't change,
> the new checkpoint still shares a path with "older" checkpoints, e.g.
> /.../job_id/chk-1234.
>
> In this case, does this mean everything under /.../job_id/ *except*
> shared/, taskowned/, and any chk-*/ folder whose id is smaller than 1234
> could be deleted? I imagine even some files under shared/ could be deleted
> as well, although that might be harder to identify.
>
> And a related question for a slightly different scenario, if I
> use ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION and trigger a
> stop-job-with-savepoint, does that trigger checkpoint deletion?
>
> Regards,
> Alexis.
>


-- 
Best,
Hangxiang.

Re: Clarification on checkpoint cleanup with RETAIN_ON_CANCELLATION

Reply via email to