Could you please share the full JobManager logs? AFAIK, you attached exceptions are normal logs when the JobManager is trying to acquire the configmap lock.
Best, Yang houssem <mejrihousse...@gmail.com> 于2021年8月31日周二 上午4:36写道: > Hello, thanks for the response > > I am using kubernetes standalone application mode not the native one. > > and this error happens randomly at some point while running the job. > > Also i am using just one replicas of the jobmanager > > here is some other logs:: > > > {"@timestamp":"2021-08-30T15:43:44.970+02:00","@version":"1","message":"Exception > occurred while renewing lock: Unable to update > ConfigMapLock","logger_name":"io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector", > "thread_name":"pool-685-thread-1","level":"DEBUG","level_value":10000,"stack_trace":"io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: > Unable to update ConfigMapLock > > io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:108) > > > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:156) > > > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120) > > > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:104) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat > java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: > Failure executing: PUT at: > > https://172.31.64.1/api/v1/namespaces/flink-pushavoo-flink-rec/configmaps/elifibre-00000000000000000000000000000000-jobmanager-leader > . > Message: Operation cannot be fulfilled on configmaps > \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object > has been modified; please apply your changes to the latest version and try > again. > Received status: Status(apiVersion=v1, code=409, > details=StatusDetails(causes=[], group=null, kind=configmaps, > name=elifibre-00000000000000000000000000000000-jobmanager-leader, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=Operation cannot be fulfilled on configmaps > \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the > object has been modified; > please apply your changes to the latest version and try again, > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Conflict, status=Failure, additionalProperties={}). > > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568) > > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507) > > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471) > > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430) > > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:289) > > > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:269) > > > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleReplace(BaseOperation.java:820) > > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:86) > > > io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:26) > > > io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:5) > > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:92) > > > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:36) > > > io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:106) > ... 10 common frames omitted\n"} > > > ********************************************************************************************************** > > > > > > On 2021/08/30 10:53:10, Roman Khachatryan <ro...@apache.org> wrote: > > Hello, > > > > Do I understand correctly that you are using native Kubernetes > > deployment in application mode; > > and the issue *only* happens if you set kubernetes-jobmanager-replicas > > [1] to a value greater than 1? > > > > Does it happen during deployment or at some point while running the job? > > > > Could you share Flink and Kubernetes versions and HA configuration > > [2]? (I'm assuming you're using Kubernetes for HA, not ZK). > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#kubernetes-jobmanager-replicas > > [2] > > > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/ > > > > Regards, > > Roman > > > > On Fri, Aug 27, 2021 at 2:31 PM mejri houssem <mejrihousse...@gmail.com> > wrote: > > > > > > hello i am deploying a flink application cluster with kubernetes HA > mode, but i am facing this recurrent problem and i didn't know how to > solve it. > > > > > > Any help would be appreciated. > > > > > > > > > > > > this of the jobManager: > > > > > > > {"@timestamp":"2021-08-27T14:19:42.447+02:00","@version":"1","message":"Exception > occurred while renewing lock: Unable to update > ConfigMapLock","logger_name":"io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector","thread_name":"pool-4092-thread-1","level":"DEBUG","level_value":10000,"stack_trace":"io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: > Unable to update ConfigMapLock\n\tat > io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:108)\n\tat > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:156)\n\tat > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.renew(LeaderElector.java:120)\n\tat > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:104)\n\tat > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat > java.util.concurrent.FutureTask.run(Fut > ureT > > > ask.java:266)\n\tat > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat > java.lang.Thread.run(Thread.java:748)\nCaused by: > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > PUT at: > https://172.31.64.1/api/v1/namespaces/flink-pushavoo-flink-rec/configmaps/elifibre-00000000000000000000000000000000-jobmanager-leader. > Message: Operation cannot be fulfilled on configmaps > \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object > has been modified; please apply your changes to the latest version and try > again. Received status: Status(apiVersion=v1, code=409, > details=StatusDetails(causes=[], > gro > > > up=null, kind=configmaps, > name=elifibre-00000000000000000000000000000000-jobmanager-leader, > retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, > message=Operation cannot be fulfilled on configmaps > \"elifibre-00000000000000000000000000000000-jobmanager-leader\": the object > has been modified; please apply your changes to the latest version and try > again, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Conflict, status=Failure, additionalProperties={}).\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)\n\tat > io.fabric8.kube > rnet > > > > es.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:289)\n\tat > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleReplace(OperationSupport.java:269)\n\tat > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleReplace(BaseOperation.java:820)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:86)\n\tat > io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:26)\n\tat > io.fabric8.kubernetes.api.model.DoneableConfigMap.done(DoneableConfigMap.java:5)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:92)\n\tat > io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:36)\n\tat > io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.update(ConfigMapLock.java:106)\n\t... > 10 common frames omitted\n"} > > > > > >