I'm running Flink 1.7.2 in a Docker swarm. Intermittently, new task managers
will fail to resolve their own host names when starting up. In the log I see
"no hostname could be resolved" messages coming from TaskManagerLocation. The
webUI on the jobmanager shows the taskmanagers as are associated/connected with
the jobmanager, but their akka paths show their IP, rather than the container
name that 'good' taskmanager show. Those taskmanagers that are listed by IP
give 'failed to connect' errors when new jobs are started that try to use those
taskmanagers, and that job eventually fails. But the taskmanagers with this
condition still give regular heartbeats to the Jobmanager, so the jobmanager
keeps trying to assign work to them. Does anyone know what's going on here?