Re: Task manager fails to start in HA mode flink 1.19.1

Sigalit Eliazov Thu, 23 Jan 2025 07:51:07 -0800

additional info stack trace:
in flink 18 is saw the following and it looks like the resourceVersion is
the correct one
2025-01-22 20:02:37,771 DEBUG
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer
[] - Resync skipped due to 0 full resync period for
v1/namespaces/psp1/configmaps
2025-01-22 20:02:39,048 DEBUG
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.informers.impl.cache.Reflector
[] - Listing items (1) for v1/namespaces/psp1/configmaps at v408848717
2025-01-22 20:02:39,049 DEBUG
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.informers.impl.cache.Reflector
[] - Starting watcher for v1/namespaces/psp1/configmaps at *v408848717*
2025-01-22 20:02:39,054 DEBUG
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.internal.
[] - Watching
https://172.30.0.1:443/api/v1/namespaces/psp1/configmaps?fieldSelector=metadata.name%3Dpipeline-flinksql-cluster-config-map&resourceVersion=
*408848717*&timeoutSeconds=600&allowWatchBookmarks=true&watch=true...
2025-01-22 20:02:39,250 DEBUG
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
[] - WebSocket successfully opened


in flink 19 the resourceVersion is zero, and idea why this could happen?

2025-01-22 20:19:43,671 DEBUG
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer
[] - Resync skipped due to 0 full resync period for
v1/namespaces/psp1/configmaps
2025-01-22 20:19:44,346 DEBUG
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.http.StandardHttpClient
[] - HTTP operation on url:
https://172.30.0.1:443/api/v1/namespaces/psp1/configmaps?fieldSelector=metadata.name%3Dpipeline-flinksql-cluster-config-map&;
*resourceVersion*=0 should be retried after 100 millis because of
IOException
  thanks again

On Thu, Jan 23, 2025 at 1:04 PM Sigalit Eliazov <e.siga...@gmail.com> wrote:

> I saw there was an upgrade in flink 1.19 to  6.9.2 where in flink 1.18
> 6.6.2 was used.
> when running the same jar with flink 18 it works ok.
> Was there any additional configuration required for version 19 due to this
> upgrade?
>
> thanks again
>
> On Wed, Jan 22, 2025 at 3:24 PM Sigalit Eliazov <e.siga...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I recently upgraded to *Flink 1.19.1* and am using the *Flink Kubernetes
>> Operator 1.9* to deploy the Flink cluster. The checkpoints are defined
>> using PersistentVolumeClaims (PVCs), and the service account is configured
>> with the necessary permissions.
>>
>> However, when starting the pipeline in *HA mode*, the TaskManager fails
>> with the following error:
>>
>> 2025-01-21 09:37:02,500 ERROR 
>> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.informers.impl.cache.Reflector
>>  [] - listSyncAndWatch failed for v1/namespaces/psp1/configmaps, will 
>> stopjava.util.concurrent.CompletionException: java.io.IOException: 
>> Unexpected response code for CONNECT: 504
>>      at 
>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
>>  ~[?:?]
>>      at 
>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
>>  ~[?:?]
>>      at 
>> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1141)
>>  ~[?:?]
>>      at 
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>>  [?:?]
>>      at 
>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>>  [?:?]
>>      at 
>> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$1.onFailure(OkHttpClientImpl.java:320)
>>  [flink-dist-1.19.1.jar:1.19.1]
>>      at 
>> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
>>  [flink-dist-1.19.1.jar:1.19.1]
>>      at 
>> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>>  [flink-dist-1.19.1.jar:1.19.1]
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>>  [?:?]
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>>  [?:?]
>>      at java.lang.Thread.run(Thread.java:840) [?:?]Caused by: 
>> java.io.IOException: Unexpected response code for CONNECT: 504.
>>
>> I’ve confirmed that I’m using *Fabric8 Kubernetes Client 6.9.2*, which
>> aligns with the version used by Flink 1.19.1. I also attempted to adjust
>> the Kubernetes timeouts in the Flink configuration, but the issue persists.
>>
>> Here is the Kubernetes version information:
>>
>>    - *Client Version*: v1.23.3
>>    - *Server Version*: v1.27.10+28ed2d7
>>
>> Do you have any suggestions for resolving this issue? Any insights or
>> guidance would be greatly appreciated.
>>
>> Thanks
>>
>

Re: Task manager fails to start in HA mode flink 1.19.1

Reply via email to