Leon zhang created HDFS-14376:
---------------------------------

             Summary: Yarn Client may use stale DNS to connect to RM
                 Key: HDFS-14376
                 URL: https://issues.apache.org/jira/browse/HDFS-14376
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: caching
    Affects Versions: 2.9.1
            Reporter: Leon zhang


This happens more frequently when running yarn in Kubernetes. When yarn client 
try to connect to RM, if the DNS of RM is not resovable due to kube-dns failure 
or not ready, the yarn client will initaize itself with unresoved 
InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM will fail 
with UnknownHostException. Yarn client will retry the connection by RetryProxy 
by it always use the cached unresolved InetSocketAddress. The retry will never 
success. When RM is reschdured to another kubernetes node, which changed the RM 
ip, this bug will also happen. Currently the work around is to restarting the 
Yarn client. 

This issue happens in both HA and non-HA of RM. HDFS has simialr issues. 
[https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48]

I propose to add a new RMFailoverProxyProvider called 
AutoRefreshRMFailoverProxyProvider which will resove the DNS in the overwriten 
function getProxy(). This way, RetryProxy can resolve the DNS each time it 
retry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to