When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but upon
visiting Flink UI I can see no metrics and there are WARN messages in
jobmanager's log:

[flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor
flink-metrics-akka.remote.default-remote-dispatcher-3 - Association with
remote system
[akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]
has failed, address is now gated for [50] ms. Reason: [Association failed
with 
[akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]]
Caused by: [adhoc-historical-taskmanager-d4b65dfd4-h5nrx: Name or service
not known]

Note: adhoc-historical-taskmanager-d4b65dfd4-h5nrx is a hostname of a pod
on which taskmanager is running.

So, jobmanager tries to resolve taskmanager's hostname (which probably got
to it from taskmanager itself) on a random port. How can this be mitigated?

Reply via email to