New JM pod tries to connect to failed JM pod

Alexey Trenikhun Mon, 18 Apr 2022 18:25:51 -0700

Hello,
We are running Flink 1.13.6 in Kubernetes with k8s HA, the setup includes 1 JM 
and TM.  Recently In jobmanager log I started to see:


2022-04-19T00:11:33.102Z Association with remote system 
[akka.tcp://flink@10.204.0.126:6123] has failed, address is now gated for [50] 
ms. Reason: [Association failed with [akka.tcp://flink@10.204.0.126:6123]] 
Caused by: [No response from remote for outbound association. Associate timed 
out after [20000 ms].]

I suspect that root cause are some network issues. But what is very strange 
that this log from pod gsp-jm-424--1-8v5qj (10.204.2.138) and 10.204.0.126 is 
IP address of failed JM pod - gsp-jm-424--1-kdhqp, so looks like newer instance 
of JM (10.204.2.138) is trying to connect to older failed instance of JM 
(10.204.0.126).

Thanks,
Alexey

New JM pod tries to connect to failed JM pod

Reply via email to