I have been working with Flink under Kubernetes recently and I have run into some problems with metrics. I think I have it figured out though. It appears that it's trying to use hostname resolution for the jobmanagers. This causes this error:
Association with remote system [akka.tcp://flink@flink-taskmanager-7dffcf7975-vb2pc:42028] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-taskmanager-7dffcf7975-vb2pc:42028]] Caused by: [flink-taskmanager-7dffcf7975-vb2pc] I noticed that if I put hosts file entries on the jobmanager for each of the task managers then everything started working. Is there a way to specify the hostname of taskmanager like you can with the jobmanager? -Steve