Hello,
We are running Flink 1.13.6 in Kubernetes with k8s HA, the setup includes 1 JM 
and TM.  Recently In jobmanager log I started to see:

2022-04-19T00:11:33.102Z Association with remote system 
[akka.tcp://flink@10.204.0.126:6123] has failed, address is now gated for [50] 
ms. Reason: [Association failed with [akka.tcp://flink@10.204.0.126:6123]] 
Caused by: [No response from remote for outbound association. Associate timed 
out after [20000 ms].]

I suspect that root cause are some network issues. But what is very strange 
that this log from pod gsp-jm-424--1-8v5qj (10.204.2.138) and 10.204.0.126 is 
IP address of failed JM pod - gsp-jm-424--1-kdhqp, so looks like newer instance 
of JM (10.204.2.138) is trying to connect to older failed instance of JM 
(10.204.0.126).

Thanks,
Alexey

Reply via email to