Leon zhang created HDFS-14376: --------------------------------- Summary: Yarn Client may use stale DNS to connect to RM Key: HDFS-14376 URL: https://issues.apache.org/jira/browse/HDFS-14376 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.9.1 Reporter: Leon zhang
This happens more frequently when running yarn in Kubernetes. When yarn client try to connect to RM, if the DNS of RM is not resovable due to kube-dns failure or not ready, the yarn client will initaize itself with unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM will fail with UnknownHostException. Yarn client will retry the connection by RetryProxy by it always use the cached unresolved InetSocketAddress. The retry will never success. When RM is reschdured to another kubernetes node, which changed the RM ip, this bug will also happen. Currently the work around is to restarting the Yarn client. This issue happens in both HA and non-HA of RM. HDFS has simialr issues. [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48] I propose to add a new RMFailoverProxyProvider called AutoRefreshRMFailoverProxyProvider which will resove the DNS in the overwriten function getProxy(). This way, RetryProxy can resolve the DNS each time it retry. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org