Yes, the container seems to be resolving its own host name correctly (the Flink 
docker image doesn’t come with nslookup installed, but pinging by host name 
worked). When I did the check, it had been a considerable time since the 
container started, so I can’t rule out a race condition between flink startup 
and container hostname assignment.

Another weird thing I noticed is that the IP being reported by the Jobmanager 
in place of the host name isn’t for an individual container. Instead, it’s the 
virtual IP for the whole task manager service. Which seems strange, since that 
hostname that points to the taskmanager service isn’t something I put in 
Flink’s config files anywhere, and I don’t think containers should be referring 
to themselves by that name.

From: Yang Wang [mailto:danrtsey...@gmail.com]
Sent: Sunday, December 22, 2019 7:15 PM
To: Martin, Nick J [US] (IS) <nick.mar...@ngc.com>
Cc: user <user@flink.apache.org>
Subject: EXT :Re: Taskmanagers in Docker Fail to Resolve Own Hostnames and 
Won't Accept Tasks

Hi Martin,

Could you `docker exec` into the problematic taskmanager and check whether the 
hostname could
be resolved to a correct ip? You could use `nslookup {tm_hostname}` to verify.


Best,
Yang

Martin, Nick J [US] (IS) <nick.mar...@ngc.com<mailto:nick.mar...@ngc.com>> 
于2019年12月21日周六 上午6:07写道:
I’m running Flink 1.7.2 in a Docker swarm. Intermittently, new task managers 
will fail to resolve their own host names when starting up. In the log I see 
“no hostname could be resolved” messages coming from TaskManagerLocation. The 
webUI on the jobmanager shows the taskmanagers as are associated/connected with 
the jobmanager, but their akka paths show their IP, rather than the container 
name that ‘good’ taskmanager show. Those taskmanagers that are listed by IP 
give ‘failed to connect’ errors when new jobs are started that try to use those 
taskmanagers, and that job eventually fails. But the taskmanagers with this 
condition still give regular heartbeats to the Jobmanager, so the jobmanager 
keeps trying to assign work to them. Does anyone know what’s going on here?

Reply via email to