Weike Dong created FLINK-19677:
----------------------------------

             Summary: TaskManager takes abnormally long time to register with 
JobManager on Kubernetes
                 Key: FLINK-19677
                 URL: https://issues.apache.org/jira/browse/FLINK-19677
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Task
    Affects Versions: 1.11.2, 1.11.1, 1.11.0
            Reporter: Weike Dong


During the registration process of TaskManager, JobManager would create a 

_TaskManagerLocation_ instance, which tries to get hostname of the TaskManager 
via reverse DNS lookup.

However, this always fails in Kubernetes environment, because for pods that are 
not exposed by Services, their IPs cannot be resolved to domains by coredns, 
and _InetAddress#getCanonicalHostName()_ would take ~5 seconds to return, 
blocking the whole registration process.

Therefore Flink should provide a configuration parameter to turn off reverse 
DNS lookup. Also, even when hostname is actually needed, this could be done 
lazily to avoid blocking registration of other TaskManagers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to