Hello, We are running Flink 1.13.6 in Kubernetes with k8s HA, the setup includes 1 JM and TM. Recently In jobmanager log I started to see:
2022-04-19T00:11:33.102Z Association with remote system [akka.tcp://flink@10.204.0.126:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@10.204.0.126:6123]] Caused by: [No response from remote for outbound association. Associate timed out after [20000 ms].] I suspect that root cause are some network issues. But what is very strange that this log from pod gsp-jm-424--1-8v5qj (10.204.2.138) and 10.204.0.126 is IP address of failed JM pod - gsp-jm-424--1-kdhqp, so looks like newer instance of JM (10.204.2.138) is trying to connect to older failed instance of JM (10.204.0.126). Thanks, Alexey