Flink Kubernetes HA

Ivan Yang Tue, 22 Jun 2021 09:30:49 -0700

Hi Dear Flink users,

We recently implemented enabled the zookeeper less HA in our kubernetes Flink 
deployment. The set up has
high-availability.storageDir: s3://some-bucket/recovery 
<s3://some-bucket/recovery>


Since we have a retention policy on the s3 bucket, relatively short 7 days. So 
the HA will fail if the submittedJobGraph 
<https://s3.console.aws.amazon.com/s3/object/flink-checkpointing-prod-eu-central-1?region=eu-central-1&prefix=recovery/default/submittedJobGraph5b30c5214899>xxxxxx
 is deleted by s3. If we remove the retention policy, completedCheckpoint 
<https://s3.console.aws.amazon.com/s3/object/flink-checkpointing-prod-eu-central-1?prefix=recovery/default/completedCheckpoint001fd6e39810>xxxx
 files will keep growing. The only way I can think of is to use a patterned 
based file retention policy in s3. Before I do that, Is there any config keys 
available in Flink I can tune to not keep the all the completeCheckpoint* in HA?

Thanks,
Ivan

Flink Kubernetes HA

Reply via email to