Can You please share your configs? I'm using native kubernetes without HA and there's no issues. I'm curious how this happens. AFAIK jobid is generated randomly.
Harsh Shah <harsh.a.s...@shopify.com> 于2021年8月4日周三 上午2:44写道: > Hello, > > I am trying to use Flink HA mode inside kubernetes > <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/> > in standalone > <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/overview/#application-mode> > mode. > The Job ID is always constant, "00000000000000000000000000000000". In > situation where we restart the job (Not from a check-point or savepoint), > we see errors like > """ > > Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: > '<PATH>/flink-checkpoints/00000000000000000000000000000000/chk-1/_metadata' > already exists > > """ > where checkpoints have not been created since the restart of Job . > > My question: > * Is the recommended way to set a new unique "checkpoint path" every time > we update Job and restart necessary k8 resources (say not restarted from > checkpoint-savepoint)? Or GC checkpoints during deletion and reload from > savepoint if required? Looking for a standard recommendation. > * Is there a way I can override the JobID to be unique and indicate it is > a complete restart in HA mode? > > > Thanks, > Harsh >