Hi Hangxiang,

after some more digging, I think the job ID is maintained not because of
Flink HA, but because of the Kubernetes operator. It seems to me that
"savepoint" upgrade mode should ideally alter job ID when starting from the
savepoint, but I'm not sure.

Regards,
Alexis.

Am Mo., 12. Dez. 2022 um 10:31 Uhr schrieb Hangxiang Yu <master...@gmail.com
>:

> Hi Alexis.
> IIUC, by default, the job id of the new job should be different if you
> restore from a stopped job ? Whether to cleanup is related to the savepoint
> restore mode.
> Just in the case of failover, the job id should not change, and everything
> in the checkpoint dir will be claimed as you said.
>
> > And a related question for a slightly different scenario, if I
> use ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION and trigger a
> stop-job-with-savepoint, does that trigger checkpoint deletion?
> In this case, the checkpoint will be cleaned and not retained and the
> savepoint will remain. So you still could use savepoint to restore.
>
> On Mon, Dec 5, 2022 at 6:33 PM Alexis Sarda-Espinosa <
> sarda.espin...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a doubt about a very particular scenario with this configuration:
>>
>> - Flink HA enabled (Kubernetes).
>> - ExternalizedCheckpointCleanup set to RETAIN_ON_CANCELLATION.
>> - Savepoint restore mode left as default NO_CLAIM.
>>
>> During an upgrade, a stop-job-with-savepoint is triggered, and then that
>> savepoint is used to start the upgraded job. Based on what I see, since HA
>> is enabled, the job ID doesn't change. Additionally, I understand the first
>> checkpoint after restoration will be a full one so that there's no
>> dependency on the used savepoint. However, since the job ID didn't change,
>> the new checkpoint still shares a path with "older" checkpoints, e.g.
>> /.../job_id/chk-1234.
>>
>> In this case, does this mean everything under /.../job_id/ *except*
>> shared/, taskowned/, and any chk-*/ folder whose id is smaller than 1234
>> could be deleted? I imagine even some files under shared/ could be deleted
>> as well, although that might be harder to identify.
>>
>> And a related question for a slightly different scenario, if I
>> use ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION and trigger a
>> stop-job-with-savepoint, does that trigger checkpoint deletion?
>>
>> Regards,
>> Alexis.
>>
>
>
> --
> Best,
> Hangxiang.
>

Reply via email to