>From the implementation of DefaultCompletedCheckpointStore, Flink will only
retain the configured amount of checkpoints.

Maybe you could also check the content of jobmanager-leader ConfigMap. It
should have the same number of pointers for the completedCheckpointxxxx.


Best,
Yang

Ivan Yang <ivanygy...@gmail.com> 于2021年6月24日周四 上午2:25写道:

> Thanks for the reply. Yes, We are seeing all the completedCheckpointxxxx
> and they keep growing. We will revisit our k8s set up, configmap etc
>
> On Jun 23, 2021, at 2:09 AM, Yang Wang <danrtsey...@gmail.com> wrote:
>
> Hi Ivan,
>
> For completedCheckpointxxxx files will keep growing, do you mean too many
> files exist in the S3 bucket?
>
> AFAIK, if the K8s HA services work normally, only
> one completedCheckpointxxxx file will be retained. Once a
> new one is generated, the old one will be deleted.
>
>
> Best,
> Yang
>
> Ivan Yang <ivanygy...@gmail.com> 于2021年6月23日周三 上午12:31写道:
>
>> Hi Dear Flink users,
>>
>> We recently implemented enabled the zookeeper less HA in our kubernetes
>> Flink deployment. The set up has
>>
>> high-availability.storageDir: s3://some-bucket/recovery
>>
>>
>> Since we have a retention policy on the s3 bucket, relatively short 7
>> days. So the HA will fail if the submittedJobGraph
>> <https://s3.console.aws.amazon.com/s3/object/flink-checkpointing-prod-eu-central-1?region=eu-central-1&prefix=recovery/default/submittedJobGraph5b30c5214899>
>> xxxxxx is deleted by s3. If we remove the retention policy,
>> completedCheckpoint
>> <https://s3.console.aws.amazon.com/s3/object/flink-checkpointing-prod-eu-central-1?prefix=recovery/default/completedCheckpoint001fd6e39810>xxxx
>> files will keep growing. The only way I can think of is to use a
>> patterned based file retention policy in s3. Before I do that, Is there any
>> config keys available in Flink I can tune to not keep the all the
>> completeCheckpoint* in HA?
>>
>> Thanks,
>> Ivan
>>
>>
>>
>>
>>
>
>

Reply via email to