Hi,

I recently upgraded to *Flink 1.19.1* and am using the *Flink Kubernetes
Operator 1.9* to deploy the Flink cluster. The checkpoints are defined
using PersistentVolumeClaims (PVCs), and the service account is configured
with the necessary permissions.

However, when starting the pipeline in *HA mode*, the TaskManager fails
with the following error:

2025-01-21 09:37:02,500 ERROR
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.informers.impl.cache.Reflector
[] - listSyncAndWatch failed for v1/namespaces/psp1/configmaps, will
stopjava.util.concurrent.CompletionException: java.io.IOException:
Unexpected response code for CONNECT: 504
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
~[?:?]
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
~[?:?]
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1141)
~[?:?]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
[?:?]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
[?:?]
        at 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$1.onFailure(OkHttpClientImpl.java:320)
[flink-dist-1.19.1.jar:1.19.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
[flink-dist-1.19.1.jar:1.19.1]
        at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[flink-dist-1.19.1.jar:1.19.1]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]Caused by:
java.io.IOException: Unexpected response code for CONNECT: 504.

I’ve confirmed that I’m using *Fabric8 Kubernetes Client 6.9.2*, which
aligns with the version used by Flink 1.19.1. I also attempted to adjust
the Kubernetes timeouts in the Flink configuration, but the issue persists.

Here is the Kubernetes version information:

   - *Client Version*: v1.23.3
   - *Server Version*: v1.27.10+28ed2d7

Do you have any suggestions for resolving this issue? Any insights or
guidance would be greatly appreciated.

Thanks

Reply via email to