Vikram Ahuja created HADOOP-19460: ------------------------------------- Summary: High number of Threads Launched when Calling fs.getFileStatus() via proxyUser after Kerberos authentication. Key: HADOOP-19460 URL: https://issues.apache.org/jira/browse/HADOOP-19460 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.3.6 Reporter: Vikram Ahuja
We have observed an issue where very large number of threads are being launched when performing concurrent {{fs.getFileStatus(path) operations}} as proxyUser. Although this issue was observed in our hive services, we were able to replicate this issue without hive by writing a sample standalone program which first logs in via a principal and keytab and then creates a proxy user and fires concurrent {{fs.getFileStatus(path)}} for a few mins. Eventually when the concurrency increases it tries to create more threads than max available threads(ulimit range) and the process eventually slows down. {code:java} UserGroupInformation proxyUserUGI = UserGroupInformation.createProxyUser( "hive", UserGroupInformation.getLoginUser());{code} In this particular case, when launching 30 concurrent threads calling , the max number of threads launched by the PID are 6066. {code:java} Every 1.0s: ps -eo nlwp,pid,args --sort -nlwp | head Wed Feb 19 06:12:47 2025 NLWP PID COMMAND 6066 700718 /usr/lib/jvm/java-17-openjdk/bin/java -cp ./test.jar:/usr/hadoop/*:/usr/hadoop/lib/*:/usr/hadoop-hdfs/* org.apache.hadoop.hive.common.HDFSFileStatusExample hdfs://namenode:8020 principal keytab_location 30 {code} But the same behaviour is not observed when the same calls are made using the current userUGI instead of proxyUser. {code:java} UserGroupInformation currentUserUgi = UserGroupInformation.getCurrentUser();{code} In this case when launching 30 concurrent threads calling , the max number of threads launched by the PID are 56 and when launched with 500 concurrent threads the max number of threads launched are 524. {code:java} Every 1.0s: ps -eo nlwp,pid,args --sort -nlwp | head Tue Feb 18 06:23:18 2025NLWP PID COMMAND 56 748244 /usr/lib/jvm/java-17-openjdk/bin/java -cp ./test.jar:/usr/hadoop/*:/usr/hadoop/lib/*:/usr/hadoop-hdfs/* org.apache.hadoop.hive.common.HDFSFileStatus hdfs://namenode:8020 principal keytab_location 30 Every 1.0s: ps -eo nlwp,pid,args --sort -nlwp | head Wed Feb 19 06:19:03 2025NLWP PID COMMAND 524 750984 /usr/lib/jvm/java-17-openjdk/bin/java -cp ./test.jar:/usr/hadoop/*:/usr/hadoop/lib/*:/usr/hadoop-hdfs/* org.apache.hadoop.hive.common.HDFSFileStatus hdfs://namenode:8020 principal keytab_location 500{code} I am attaching both the sample programs where in one case the calls are made by ProxyUser(issue occurs here) and in another case the call is made by currentUser(Works fine). The command line args given for the sample program are: arg[0] = namenode_host_name:port arg[1] = principal arg[2] = keytab_location arg[3] = Number of threads -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org