This is a known issue. Please refer here[1] for more information. And it is already fixed in master and 1.12 branch. Also the next minor Flink release version(1.12.1) will include it. Maybe you could help to verify that.
[1]. https://issues.apache.org/jira/browse/FLINK-20648 Best, Yang ChangZhuo Chen (陳昌倬) <czc...@czchen.org> 于2020年12月30日周三 上午9:35写道: > Hi, > > We cannot start job from savepoint (created by Flink 1.12, Standalone > Kubernetes + zookeeper HA) in Flink 1.12, Standalone Kubernetes + > Kubernetes HA. The following is the exception that stops the job. > > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.kubernetes.kubeclient.resources.KubernetesException: > Cannot retry checkAndUpdateConfigMap with configMap > name-51e5afd90227d537ff442403d1b279da-jobmanager-leader because it does not > exist. > > > Cluster can start new job from scratch, so we think cluster > configuration is good. > > The following is HA related config: > > high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > high-availability.storageDir: gs://some/path/recovery > kubernetes.cluster-id: cluster-name > kubernetes.context: kubernetes-context > kubernetes.namespace: kubernetes-namespace > > > -- > ChangZhuo Chen (陳昌倬) czchen@{czchen,debconf,debian}.org > http://czchen.info/ > Key fingerprint = BA04 346D C2E1 FE63 C790 8793 CC65 B0CD EC27 5D5B >