Scenerio

* savepoint with Cancel followed by a restore on the Job. It brings down
the JM and relaunches on a different IP, thus the resolution of dns is a
new IP.
* The TMs deployment is not rolled ( recreated )
* Note that `flink-conf.yaml:metrics.internal.query-service.port` is
hardcoded.




Remote connection to [null] failed with
org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException:
connection timed out: [dns]/172.17.6.135:6666

Solution: Restart the TM deployment ( though that should not be and will
cause latency issues on a shared Resource Manager as k8s )

PS I am sure that a cancel/restart or restart of JM b'coz of any issue will
create the same above issue ( not tested ) .



Regards

Reply via email to