Hi Kevin,

Unfortunately, the root cause for the error is missing. I can only guess
but it could indeed be FLINK-20417 [1]. If this is the case, then the
problem should be fixed with the upcoming Flink 1.12.2 version. It should
be released next week hopefully. If it should be a different problem, then
we will know better because Flink 1.12.2 will fix the problem with
swallowing the root cause. So I would highly recommend upgrading once the
next bug fix release has been released.

[1] https://issues.apache.org/jira/browse/FLINK-20417

Cheers,
Till

On Thu, Feb 11, 2021 at 9:21 AM Bohinski, Kevin <kevin_bohin...@comcast.com>
wrote:

> Hi All,
>
> On long lived session clusters we are seeing a k8s error `Error while
> watching the ConfigMap`.
> Good news is it looks like `too old resource version` issue is fixed :).
>
> Logs are attached below. Any tips?
>
> best
> Kevin
>
>
> 2021-02-11 07:55:15,249 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed
> checkpoint 4 for job 58ec7a029cd31ad057e25479a9979cb4 (202852094 bytes in
> 49274 ms).
> 2021-02-11 08:00:15,732 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
> Triggering checkpoint 5 (type=CHECKPOINT) @ 1613030415249 for job
> 58ec7a029cd31ad057e25479a9979cb4.
> 2021-02-11 08:00:25,446 ERROR
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
> Fatal error occurred in ResourceManager.
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error
> while watching the ConfigMap
> JOB_NAME-6a3361c3fdeb4dd9ba80d8e667a8093e-jobmanager-leader
> at
> org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_282]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_282]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
> 2021-02-11 08:00:25,456 ERROR
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal
> error occurred in the cluster entrypoint.
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error
> while watching the ConfigMap
> JOB_NAME-6a3361c3fdeb4dd9ba80d8e667a8093e-jobmanager-leader
> at
> org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> [flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_282]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_282]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
> 2021-02-11 08:00:25,487 INFO  org.apache.flink.runtime.blob.BlobServer
>                  [] - Stopped BLOB server at 0.0.0.0:6124
>

Reply via email to