Hey All, We recently updated to Flink 1.15.1. We deploy stream cluster in Application mode in Native K8S.(Deployed on Amazon EKS). The cluster is configured with Kubernetes HA Service, Minimum 3 replicas of Job manager and pod-template which is configured with topologySpreadConstraints to enable distribution across different availability zones. HA storage directory is on S3.
The cluster is deployed and running properly, however, after a while we noticed the following exception in Job manager instance(the log file is enclosed) 2022-09-04T02:05:33,097][Error] {} [i.f.k.c.e.l.LeaderElector]: Exception occurred while acquiring lock 'ConfigMapLock: dev-0-flink-jobs - data-agg-events-insertion-cluster-config-map (b6da2ae2-ad2b-471c-801e-ea460a348fab)' io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [ConfigMap] with name: [data-agg-events-insertion-cluster-config-map] in namespace: [dev-0-flink-jobs] failed. Caused by: java.io.FileNotFoundException: /opt/flink/.kube/config (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:?] at java.io.FileInputStream.open(Unknown Source) ~[?:?] at java.io.FileInputStream.<init>(Unknown Source) ~[?:?] at org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:354) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:15) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3494) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfig(KubeConfigUtils.java:42) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.utils.TokenRefreshInterceptor.intercept(TokenRefreshInterceptor.java:44) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createApplicableInterceptors$6(HttpClientUtils.java:290) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:81) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:585) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:488) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:470) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:830) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:200) ~[flink-dist-1.15.1.jar:1.15.1] ... 12 more Why is Kube/config needed in Native K8s, should not service account be checked instead? Are we missing something? Thanks, Tamir. Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
[2022-09-01T19:25:21,709][Info] {} [o.a.f.r.l.DefaultLeaderElectionService]: Starting DefaultLeaderElectionService with org.apache.flink.runtime.leaderelection.MultipleComponentLeaderElectionDriverAdapter@641cea11. [2022-09-01T19:25:21,710][Info] {} [o.a.f.r.j.MiniDispatcherRestEndpoint]: Web frontend listening at http://10.227.193.49:8081. [2022-09-01T19:25:21,797][Info] {} [o.a.f.r.l.DefaultLeaderElectionService]: Starting DefaultLeaderElectionService with org.apache.flink.runtime.leaderelection.MultipleComponentLeaderElectionDriverAdapter@633cad4d. [2022-09-01T19:25:21,798][Info] {} [o.a.f.r.r.ResourceManagerServiceImpl]: Starting resource manager service. [2022-09-01T19:25:21,798][Info] {} [o.a.f.r.l.DefaultLeaderElectionService]: Starting DefaultLeaderElectionService with org.apache.flink.runtime.leaderelection.MultipleComponentLeaderElectionDriverAdapter@15c3585. [2022-09-01T19:25:21,800][Info] {} [o.a.f.k.k.r.KubernetesConfigMapSharedInformer]: Starting to watch for dev-0-flink-jobs/data-agg-events-insertion-cluster-config-map, watching id:cd466ac1-ba9b-4c62-a65b-4be4d7f9b4b6 [2022-09-01T19:25:21,800][Info] {} [o.a.f.r.l.DefaultLeaderRetrievalService]: Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver{configMapName='data-agg-events-insertion-cluster-config-map'}. [2022-09-01T19:25:21,800][Info] {} [o.a.f.k.k.r.KubernetesConfigMapSharedInformer]: Starting to watch for dev-0-flink-jobs/data-agg-events-insertion-cluster-config-map, watching id:b977a6bd-f706-4055-95d4-e71604bbe657 [2022-09-01T19:25:21,800][Info] {} [o.a.f.r.l.DefaultLeaderRetrievalService]: Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver{configMapName='data-agg-events-insertion-cluster-config-map'}. [2022-09-01T19:34:13,888][Info] {} [o.a.k.c.NetworkClient]: [Producer clientId=producer-1] Node -3 disconnected. WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.jboss.netty.util.internal.ByteBufferUtil (file:/tmp/flink-rpc-akka_fd65d1f4-ddde-4df6-b152-d95b51308356.jar) to method java.nio.DirectByteBuffer.cleaner() WARNING: Please consider reporting this to the maintainers of org.jboss.netty.util.internal.ByteBufferUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release [2022-09-04T02:05:33,097][Error] {} [i.f.k.c.e.l.LeaderElector]: Exception occurred while acquiring lock 'ConfigMapLock: dev-0-flink-jobs - data-agg-events-insertion-cluster-config-map (b6da2ae2-ad2b-471c-801e-ea460a348fab)' io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [ConfigMap] with name: [data-agg-events-insertion-cluster-config-map] in namespace: [dev-0-flink-jobs] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:205) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:167) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:90) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.get(ConfigMapLock.java:55) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:134) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$acquire$0(LeaderElector.java:82) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$loop$3(LeaderElector.java:198) ~[flink-dist-1.15.1.jar:1.15.1] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?] at java.util.concurrent.FutureTask.runAndReset(Unknown Source) ~[?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?] at java.lang.Thread.run(Unknown Source) ~[?:?] Caused by: java.io.FileNotFoundException: /opt/flink/.kube/config (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[?:?] at java.io.FileInputStream.open(Unknown Source) ~[?:?] at java.io.FileInputStream.<init>(Unknown Source) ~[?:?] at org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:354) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:15) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3494) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfig(KubeConfigUtils.java:42) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.utils.TokenRefreshInterceptor.intercept(TokenRefreshInterceptor.java:44) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createApplicableInterceptors$6(HttpClientUtils.java:290) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[flink-dist-1.15.1.jar:1.15.1] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:81) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:585) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:488) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:470) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:830) ~[flink-dist-1.15.1.jar:1.15.1] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:200) ~[flink-dist-1.15.1.jar:1.15.1] ... 12 more