Re: Flink Kubernetes HA

Ivan Yang Wed, 23 Jun 2021 11:26:08 -0700

Thanks for the reply. Yes, We are seeing all the completedCheckpointxxxx and 
they keep growing. We will revisit our k8s set up, configmap etc


> On Jun 23, 2021, at 2:09 AM, Yang Wang <danrtsey...@gmail.com> wrote:
> 
> Hi Ivan,
> 
> For completedCheckpointxxxx files will keep growing, do you mean too many 
> files exist in the S3 bucket?
> 
> AFAIK, if the K8s HA services work normally, only one completedCheckpointxxxx 
> file will be retained. Once a
> new one is generated, the old one will be deleted.
> 
> 
> Best,
> Yang
> 
> Ivan Yang <ivanygy...@gmail.com <mailto:ivanygy...@gmail.com>> 于2021年6月23日周三 
> 上午12:31写道：
> Hi Dear Flink users,
> 
> We recently implemented enabled the zookeeper less HA in our kubernetes Flink 
> deployment. The set up has
> high-availability.storageDir: s3://some-bucket/recovery <>
> 
> Since we have a retention policy on the s3 bucket, relatively short 7 days. 
> So the HA will fail if the submittedJobGraph 
> <https://s3.console.aws.amazon.com/s3/object/flink-checkpointing-prod-eu-central-1?region=eu-central-1&prefix=recovery/default/submittedJobGraph5b30c5214899>xxxxxx
>  is deleted by s3. If we remove the retention policy, completedCheckpoint 
> <https://s3.console.aws.amazon.com/s3/object/flink-checkpointing-prod-eu-central-1?prefix=recovery/default/completedCheckpoint001fd6e39810>xxxx
>  files will keep growing. The only way I can think of is to use a patterned 
> based file retention policy in s3. Before I do that, Is there any config keys 
> available in Flink I can tune to not keep the all the completeCheckpoint* in 
> HA?
> 
> Thanks,
> Ivan
> 
> 
>

Re: Flink Kubernetes HA

Reply via email to