Hey All,

We recently updated to Flink 1.15.1. We deploy stream cluster in Application 
mode in Native K8S.(Deployed on Amazon EKS).  The cluster is configured with 
Kubernetes HA Service, Minimum 3 replicas of Job manager and pod-template which 
is configured with topologySpreadConstraints to enable distribution across 
different availability zones.
HA storage directory is on S3.

The cluster is deployed and running properly, however, after a while we noticed 
the following exception in Job manager instance(the log file is enclosed)

2022-09-04T02:05:33,097][Error] {} [i.f.k.c.e.l.LeaderElector]: Exception 
occurred while acquiring lock 'ConfigMapLock: dev-0-flink-jobs - 
data-agg-events-insertion-cluster-config-map 
(b6da2ae2-ad2b-471c-801e-ea460a348fab)'
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for 
kind: [ConfigMap]  with name: [data-agg-events-insertion-cluster-config-map]  
in namespace: [dev-0-flink-jobs]  failed.
Caused by: java.io.FileNotFoundException: /opt/flink/.kube/config (No such file 
or directory)
      at java.io.FileInputStream.open0(Native Method) ~[?:?]
      at java.io.FileInputStream.open(Unknown Source) ~[?:?]
      at java.io.FileInputStream.<init>(Unknown Source) ~[?:?]
      at 
org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:354)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:15)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3494)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfig(KubeConfigUtils.java:42)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.utils.TokenRefreshInterceptor.intercept(TokenRefreshInterceptor.java:44)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createApplicableInterceptors$6(HttpClientUtils.java:290)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:81) 
~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:585)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:488)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:470)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:830)
 ~[flink-dist-1.15.1.jar:1.15.1]
      at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:200)
 ~[flink-dist-1.15.1.jar:1.15.1]
      ... 12 more

Why is Kube/config needed in Native K8s,  should not service account be checked 
instead?

Are we missing something?

Thanks,
Tamir.


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.
[2022-09-01T19:25:21,709][Info] {} [o.a.f.r.l.DefaultLeaderElectionService]: 
Starting DefaultLeaderElectionService with 
org.apache.flink.runtime.leaderelection.MultipleComponentLeaderElectionDriverAdapter@641cea11.
[2022-09-01T19:25:21,710][Info] {} [o.a.f.r.j.MiniDispatcherRestEndpoint]: Web 
frontend listening at http://10.227.193.49:8081.
[2022-09-01T19:25:21,797][Info] {} [o.a.f.r.l.DefaultLeaderElectionService]: 
Starting DefaultLeaderElectionService with 
org.apache.flink.runtime.leaderelection.MultipleComponentLeaderElectionDriverAdapter@633cad4d.
[2022-09-01T19:25:21,798][Info] {} [o.a.f.r.r.ResourceManagerServiceImpl]: 
Starting resource manager service.
[2022-09-01T19:25:21,798][Info] {} [o.a.f.r.l.DefaultLeaderElectionService]: 
Starting DefaultLeaderElectionService with 
org.apache.flink.runtime.leaderelection.MultipleComponentLeaderElectionDriverAdapter@15c3585.
[2022-09-01T19:25:21,800][Info] {} 
[o.a.f.k.k.r.KubernetesConfigMapSharedInformer]: Starting to watch for 
dev-0-flink-jobs/data-agg-events-insertion-cluster-config-map, watching 
id:cd466ac1-ba9b-4c62-a65b-4be4d7f9b4b6
[2022-09-01T19:25:21,800][Info] {} [o.a.f.r.l.DefaultLeaderRetrievalService]: 
Starting DefaultLeaderRetrievalService with 
KubernetesLeaderRetrievalDriver{configMapName='data-agg-events-insertion-cluster-config-map'}.
[2022-09-01T19:25:21,800][Info] {} 
[o.a.f.k.k.r.KubernetesConfigMapSharedInformer]: Starting to watch for 
dev-0-flink-jobs/data-agg-events-insertion-cluster-config-map, watching 
id:b977a6bd-f706-4055-95d4-e71604bbe657
[2022-09-01T19:25:21,800][Info] {} [o.a.f.r.l.DefaultLeaderRetrievalService]: 
Starting DefaultLeaderRetrievalService with 
KubernetesLeaderRetrievalDriver{configMapName='data-agg-events-insertion-cluster-config-map'}.
[2022-09-01T19:34:13,888][Info] {} [o.a.k.c.NetworkClient]: [Producer 
clientId=producer-1] Node -3 disconnected.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by 
org.jboss.netty.util.internal.ByteBufferUtil 
(file:/tmp/flink-rpc-akka_fd65d1f4-ddde-4df6-b152-d95b51308356.jar) to method 
java.nio.DirectByteBuffer.cleaner()
WARNING: Please consider reporting this to the maintainers of 
org.jboss.netty.util.internal.ByteBufferUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
[2022-09-04T02:05:33,097][Error] {} [i.f.k.c.e.l.LeaderElector]: Exception 
occurred while acquiring lock 'ConfigMapLock: dev-0-flink-jobs - 
data-agg-events-insertion-cluster-config-map 
(b6da2ae2-ad2b-471c-801e-ea460a348fab)'
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for 
kind: [ConfigMap]  with name: [data-agg-events-insertion-cluster-config-map]  
in namespace: [dev-0-flink-jobs]  failed.
        at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:205)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:167) 
~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:90) 
~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.get(ConfigMapLock.java:55)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:134)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$acquire$0(LeaderElector.java:82)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$loop$3(LeaderElector.java:198)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
~[?:?]
        at java.util.concurrent.FutureTask.runAndReset(Unknown Source) ~[?:?]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
~[?:?]
        at java.lang.Thread.run(Unknown Source) ~[?:?]
Caused by: java.io.FileNotFoundException: /opt/flink/.kube/config (No such file 
or directory)
        at java.io.FileInputStream.open0(Native Method) ~[?:?]
        at java.io.FileInputStream.open(Unknown Source) ~[?:?]
        at java.io.FileInputStream.<init>(Unknown Source) ~[?:?]
        at 
org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:354)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:15)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3494)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfig(KubeConfigUtils.java:42)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.utils.TokenRefreshInterceptor.intercept(TokenRefreshInterceptor.java:44)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createApplicableInterceptors$6(HttpClientUtils.java:290)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall.execute(RealCall.java:81) 
~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:585)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:558)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:488)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:470)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:830)
 ~[flink-dist-1.15.1.jar:1.15.1]
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:200)
 ~[flink-dist-1.15.1.jar:1.15.1]
        ... 12 more
        
        

Reply via email to