Re: Flink k8 HA mode + checkpoint management

Manong Karl Tue, 03 Aug 2021 18:17:52 -0700

Can You please share your configs? I'm using native kubernetes without HA
and there's no issues. I'm curious how this happens.  AFAIK jobid is
generated randomly.



Harsh Shah <harsh.a.s...@shopify.com> 于2021年8月4日周三 上午2:44写道：

> Hello,
>
> I am trying to use Flink HA mode inside kubernetes
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/>
>  in standalone
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/overview/#application-mode>
>  mode.
> The Job ID is always constant, "00000000000000000000000000000000". In
> situation where we restart the job (Not from a check-point or savepoint),
> we see errors like
> """
>
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: 
> '<PATH>/flink-checkpoints/00000000000000000000000000000000/chk-1/_metadata' 
> already exists
>
> """
> where checkpoints have not been created since the restart of Job .
>
> My question:
> * Is the recommended way to set a new unique "checkpoint path" every time
> we update Job and restart necessary k8 resources (say not restarted from
> checkpoint-savepoint)? Or GC checkpoints during deletion and reload from
> savepoint if required? Looking for a standard recommendation.
> * Is there a way I can override the JobID to be unique and indicate it is
> a complete restart in HA mode?
>
>
> Thanks,
> Harsh
>

Re: Flink k8 HA mode + checkpoint management

Reply via email to