Hi Martin, Could you `docker exec` into the problematic taskmanager and check whether the hostname could be resolved to a correct ip? You could use `nslookup {tm_hostname}` to verify.
Best, Yang Martin, Nick J [US] (IS) <nick.mar...@ngc.com> 于2019年12月21日周六 上午6:07写道: > I’m running Flink 1.7.2 in a Docker swarm. Intermittently, new task > managers will fail to resolve their own host names when starting up. In the > log I see “no hostname could be resolved” messages coming from > TaskManagerLocation. The webUI on the jobmanager shows the taskmanagers as > are associated/connected with the jobmanager, but their akka paths show > their IP, rather than the container name that ‘good’ taskmanager show. > Those taskmanagers that are listed by IP give ‘failed to connect’ errors > when new jobs are started that try to use those taskmanagers, and that job > eventually fails. But the taskmanagers with this condition still give > regular heartbeats to the Jobmanager, so the jobmanager keeps trying to > assign work to them. Does anyone know what’s going on here? >