Task manager fails to start in HA mode flink 1.19.1

Sigalit Eliazov Wed, 22 Jan 2025 05:25:43 -0800

Hi,

I recently upgraded to *Flink 1.19.1* and am using the *Flink Kubernetes
Operator 1.9* to deploy the Flink cluster. The checkpoints are defined
using PersistentVolumeClaims (PVCs), and the service account is configured
with the necessary permissions.


However, when starting the pipeline in *HA mode*, the TaskManager fails
with the following error:

2025-01-21 09:37:02,500 ERROR
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.informers.impl.cache.Reflector
[] - listSyncAndWatch failed for v1/namespaces/psp1/configmaps, will
stopjava.util.concurrent.CompletionException: java.io.IOException:
Unexpected response code for CONNECT: 504
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
~[?:?]
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
~[?:?]
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1141)
~[?:?]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
[?:?]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
[?:?]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$1.onFailure(OkHttpClientImpl.java:320)
[flink-dist-1.19.1.jar:1.19.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
[flink-dist-1.19.1.jar:1.19.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[flink-dist-1.19.1.jar:1.19.1]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]Caused by:
java.io.IOException: Unexpected response code for CONNECT: 504.

I’ve confirmed that I’m using *Fabric8 Kubernetes Client 6.9.2*, which
aligns with the version used by Flink 1.19.1. I also attempted to adjust
the Kubernetes timeouts in the Flink configuration, but the issue persists.

Here is the Kubernetes version information:

   - *Client Version*: v1.23.3
   - *Server Version*: v1.27.10+28ed2d7

Do you have any suggestions for resolving this issue? Any insights or
guidance would be greatly appreciated.

Thanks

Task manager fails to start in HA mode flink 1.19.1

Reply via email to