Hi, I think Roman is right. It seems that the JobManager is relaunched again by K8s after Flink has already deregister the application(aka delete the JobManager K8s deployment).
One possible reason might be that kubelet is too late to know the JobManager deployment is deleted. So it relaunch the JobManager pod when it terminated with exit code 0. Best, Yang Roman Khachatryan <ro...@apache.org> 于2021年10月26日周二 下午6:17写道: > Thanks for sharing this, > The sequence of events the log seems strange to me: > > 2021-10-17 03:05:55,801 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - > Close ResourceManager connection c1092812cfb2853a5576ffd78e346189: > Stopping JobMaster for job 'rt-match_12.4.5_8d48b21a' > (00000000000000000000000000000000). > 2021-10-17 03:05:59,382 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - > Starting KubernetesApplicationClusterEntrypoint (Version: 1.14.0, > Scala: 2.12, Rev:460b386, Date:2021-09-22T08:39:40+02:00) > 2021-10-17 03:06:00,251 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - > RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. > 2021-10-17 03:06:04,355 ERROR > io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] > - Exception occurred while acquiring lock 'ConfigMapLock: flink-ns - > match-70958037-f414-4925-9d60-19e90d12abc0-restserver-leader > (ef5c2463-2d66-4dce-a023-4b8a50d7acff)' > > io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: > Unable to create ConfigMapLock > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: > Operation: [create] for kind: [ConfigMap] with name: > [match-70958037-f414-4925-9d60-19e90d12abc0-restserver-leader] in > namespace: [flink-ns] failed. > Caused by: java.io.InterruptedIOException > > It looks like KubernetesApplicationClusterEntrypoint is re-started in > the middle of shutdown and, as a result, the resources it (re)creates > aren't clean up. > > Could you please also share Kubernetes logs and resource definitions > to validate the above assumption? > > Regards, > Roman > > On Mon, Oct 25, 2021 at 6:15 AM Hua Wei Chen <oscar.chen....@gmail.com> > wrote: > > > > Hi all, > > > > We have Flink jobs run on batch mode and get the job status via > JobHandler.onJobExecuted()[1]. > > > > Base on the thread[2], we expected the Configmaps will be cleaned up > after execution successfully. > > > > But we found some Configmaps not be cleanup after job success. On the > other hand, the Configmaps contents and the labels are removed. > > > > Here is one of the Configmaps. > > > > ``` > > apiVersion: v1 > > kind: ConfigMap > > metadata: > > name: match-6370b6ab-de17-4c93-940e-0ce06d05a7b8-resourcemanager-leader > > namespace: app-flink > > selfLink: >- > > > > /api/v1/namespaces/app-flink/configmaps/match-6370b6ab-de17-4c93-940e-0ce06d05a7b8-resourcemanager-leader > > uid: 80c79c87-d6e2-4641-b13f-338c3d3154b0 > > resourceVersion: '578806788' > > creationTimestamp: '2021-10-21T17:06:48Z' > > annotations: > > control-plane.alpha.kubernetes.io/leader: >- > > > > {"holderIdentity":"3da40a4a-0346-49e5-8d18-b04a68239bf3","leaseDuration":15.000000000,"acquireTime":"2021-10-21T17:06:48.092264Z","renewTime":"2021-10-21T17:06:48.092264Z","leaderTransitions":0} > > managedFields: > > - manager: okhttp > > operation: Update > > apiVersion: v1 > > time: '2021-10-21T17:06:48Z' > > fieldsType: FieldsV1 > > fieldsV1: > > 'f:metadata': > > 'f:annotations': > > .: {} > > 'f:control-plane.alpha.kubernetes.io/leader': {} > > data: {} > > ``` > > > > > > Our Flink apps run on ver. 1.14.0. > > Thanks! > > > > BR, > > Oscar > > > > > > Reference: > > [1] JobListener (Flink : 1.15-SNAPSHOT API) (apache.org) > > [2] > https://lists.apache.org/list.html?user@flink.apache.org:lte=1M:High%20availability%20data%20clean%20up%20 > > >