I second till's suggestion.

You could also build your own flink-kubernetes jar from source code of
branch 1.12. After that, bundle the
flink-kubernetes jar to the image under /opt/flink/lib directory. And push
to docker repository.

Some users come into the same issues with you and have verified the "too
old resource version" fix works well for them.


Best,
Yang

Till Rohrmann <trohrm...@apache.org> 于2021年2月12日周五 上午1:20写道:

> Hi Kevin,
>
> Unfortunately, the root cause for the error is missing. I can only guess
> but it could indeed be FLINK-20417 [1]. If this is the case, then the
> problem should be fixed with the upcoming Flink 1.12.2 version. It should
> be released next week hopefully. If it should be a different problem, then
> we will know better because Flink 1.12.2 will fix the problem with
> swallowing the root cause. So I would highly recommend upgrading once the
> next bug fix release has been released.
>
> [1] https://issues.apache.org/jira/browse/FLINK-20417
>
> Cheers,
> Till
>
> On Thu, Feb 11, 2021 at 9:21 AM Bohinski, Kevin <
> kevin_bohin...@comcast.com> wrote:
>
>> Hi All,
>>
>> On long lived session clusters we are seeing a k8s error `Error while
>> watching the ConfigMap`.
>> Good news is it looks like `too old resource version` issue is fixed :).
>>
>> Logs are attached below. Any tips?
>>
>> best
>> Kevin
>>
>>
>> 2021-02-11 07:55:15,249 INFO
>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed
>> checkpoint 4 for job 58ec7a029cd31ad057e25479a9979cb4 (202852094 bytes in
>> 49274 ms).
>> 2021-02-11 08:00:15,732 INFO
>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
>> Triggering checkpoint 5 (type=CHECKPOINT) @ 1613030415249 for job
>> 58ec7a029cd31ad057e25479a9979cb4.
>> 2021-02-11 08:00:25,446 ERROR
>> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
>> Fatal error occurred in ResourceManager.
>> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error
>> while watching the ConfigMap
>> JOB_NAME-6a3361c3fdeb4dd9ba80d8e667a8093e-jobmanager-leader
>> at
>> org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> [?:1.8.0_282]
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> [?:1.8.0_282]
>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
>> 2021-02-11 08:00:25,456 ERROR
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal
>> error occurred in the cluster entrypoint.
>> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error
>> while watching the ConfigMap
>> JOB_NAME-6a3361c3fdeb4dd9ba80d8e667a8093e-jobmanager-leader
>> at
>> org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>> [flink-dist_2.12-1.12.1.jar:1.12.1]
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> [?:1.8.0_282]
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> [?:1.8.0_282]
>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
>> 2021-02-11 08:00:25,487 INFO  org.apache.flink.runtime.blob.BlobServer
>>                  [] - Stopped BLOB server at 0.0.0.0:6124
>>
>

Reply via email to