Hi All,

On long lived session clusters we are seeing a k8s error `Error while watching 
the ConfigMap`.
Good news is it looks like `too old resource version` issue is fixed :).

Logs are attached below. Any tips?

best
Kevin


2021-02-11 07:55:15,249 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
checkpoint 4 for job 58ec7a029cd31ad057e25479a9979cb4 (202852094 bytes in 49274 
ms).
2021-02-11 08:00:15,732 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
checkpoint 5 (type=CHECKPOINT) @ 1613030415249 for job 
58ec7a029cd31ad057e25479a9979cb4.
2021-02-11 08:00:25,446 ERROR 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Fatal error occurred in ResourceManager.
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error while 
watching the ConfigMap 
JOB_NAME-6a3361c3fdeb4dd9ba80d8e667a8093e-jobmanager-leader
at 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) 
[flink-dist_2.12-1.12.1.jar:1.12.1]
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_282]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
2021-02-11 08:00:25,456 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
occurred in the cluster entrypoint.
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error while 
watching the ConfigMap 
JOB_NAME-6a3361c3fdeb4dd9ba80d8e667a8093e-jobmanager-leader
at 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) 
[flink-dist_2.12-1.12.1.jar:1.12.1]
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 [flink-dist_2.12-1.12.1.jar:1.12.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_282]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
2021-02-11 08:00:25,487 INFO  org.apache.flink.runtime.blob.BlobServer          
           [] - Stopped BLOB server at 0.0.0.0:6124

Reply via email to