[ 
https://issues.apache.org/jira/browse/HADOOP-19218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867231#comment-17867231
 ] 

Viraj Jasani commented on HADOOP-19218:
---------------------------------------

Anyway, if we want to keep (host + ip) format (available since 3.4.0) for 
longest lock holder (HDFS-15217), we can still make it happen with simple patch:
{code:java}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
index 2cb29dfef8e..4a308bce9cc 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
@@ -8838,6 +8838,9 @@ private Supplier<String> getLockReportInfoSupplier(String 
src, String dst,
       UserGroupInformation ugi = Server.getRemoteUser();
       String userName = ugi != null ? ugi.toString() : null;
       InetAddress addr = Server.getRemoteIp();
+      if (addr != null) {
+        addr.getHostName();
+      }
       StringBuilder sb = new StringBuilder();
       String s = escapeJava(src);
       String d = escapeJava(dst); {code}
Otherwise if we decide to follow same format (ip only) for all types of audit 
logs including longest lock holder (HDFS-15217), then we will need to update 
the test.

Though given that we already rolled out 3.4.0 with HDFS-15217, we can go with 
above simple fix. Having host name is always useful for k8s environments, it's 
just that we can optimize by not performing DNS lookup while creating IPC 
Connection object, that was the main purpose of this Jira.

> Avoid DNS lookup while creating IPC Connection object
> -----------------------------------------------------
>
>                 Key: HADOOP-19218
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19218
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> Been running HADOOP-18628 in production for quite sometime, everything works 
> fine as long as DNS servers in HA are available. Upgrading single NS server 
> at a time is also a common case, not problematic. Every DNS lookup takes 1ms 
> in general.
> However, recently we encountered a case where 2 out of 4 NS servers went down 
> (temporarily but it's a rare case). With small duration DNS cache and 2s of 
> NS fallback timeout configured in resolv.conf, now any client performing DNS 
> lookup can encounter 4s+ delay. This caused namenode outage as listener 
> thread is single threaded and it was not able to keep up with large num of 
> unique clients (in direct proportion with num of DNS resolutions every few 
> seconds) initiating connection on listener port.
> While having 2 out of 4 DNS servers offline is rare case and NS fallback 
> settings could also be improved, it is important to note that we don't need 
> to perform DNS resolution for every new connection if the intention is to 
> improve the insights into VersionMistmatch errors thrown by the server.
> The proposal is the delay the DNS resolution until the server throws the 
> error for incompatible header or version mismatch. This would also help with 
> ~1ms extra time spent even for healthy DNS lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to