>From the implementation of DefaultCompletedCheckpointStore, Flink will only retain the configured amount of checkpoints.
Maybe you could also check the content of jobmanager-leader ConfigMap. It should have the same number of pointers for the completedCheckpointxxxx. Best, Yang Ivan Yang <ivanygy...@gmail.com> 于2021年6月24日周四 上午2:25写道: > Thanks for the reply. Yes, We are seeing all the completedCheckpointxxxx > and they keep growing. We will revisit our k8s set up, configmap etc > > On Jun 23, 2021, at 2:09 AM, Yang Wang <danrtsey...@gmail.com> wrote: > > Hi Ivan, > > For completedCheckpointxxxx files will keep growing, do you mean too many > files exist in the S3 bucket? > > AFAIK, if the K8s HA services work normally, only > one completedCheckpointxxxx file will be retained. Once a > new one is generated, the old one will be deleted. > > > Best, > Yang > > Ivan Yang <ivanygy...@gmail.com> 于2021年6月23日周三 上午12:31写道: > >> Hi Dear Flink users, >> >> We recently implemented enabled the zookeeper less HA in our kubernetes >> Flink deployment. The set up has >> >> high-availability.storageDir: s3://some-bucket/recovery >> >> >> Since we have a retention policy on the s3 bucket, relatively short 7 >> days. So the HA will fail if the submittedJobGraph >> <https://s3.console.aws.amazon.com/s3/object/flink-checkpointing-prod-eu-central-1?region=eu-central-1&prefix=recovery/default/submittedJobGraph5b30c5214899> >> xxxxxx is deleted by s3. If we remove the retention policy, >> completedCheckpoint >> <https://s3.console.aws.amazon.com/s3/object/flink-checkpointing-prod-eu-central-1?prefix=recovery/default/completedCheckpoint001fd6e39810>xxxx >> files will keep growing. The only way I can think of is to use a >> patterned based file retention policy in s3. Before I do that, Is there any >> config keys available in Flink I can tune to not keep the all the >> completeCheckpoint* in HA? >> >> Thanks, >> Ivan >> >> >> >> >> > >