Rohan Nimmagadda created HDFS-17340: ---------------------------------------
Summary: transaction lag issue between aNN and oNN causing HDFS_DELEGATION_TOKEN can't be found in cache in oNN Key: HDFS-17340 URL: https://issues.apache.org/jira/browse/HDFS-17340 Project: Hadoop HDFS Issue Type: Bug Components: dfsclient, hdfs, namenode Affects Versions: 3.3.3 Reporter: Rohan Nimmagadda We experienced a transaction lag issue between aNN and oNN, causing problems in busier clusters. When HDFS_DELEGATION_TOKEN is created by aNN, the oNN couldn't catch up cache location immediately, leading to the issue of the token not being found in the cache in oNN. We followed the document [[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html]] to enable oNN's functionality. Here is our setup: * nn1: aNN * nn2: sNN * nn3: sNN * nn4: oNN Due to heavier read traffic, we decided to add another oNN (nn5) and set dfs.client.failover.random.order=true for better read distribution. Otherwise, all traffic is routed to the first oNN in the list. With the above setup, the HDFS_DELEGATION_TOKEN issue worsened, and simple MapReduce/hive jobs started to fail." Error from oNN logs 2024-01-15 11:03:26,152 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.xx.xx.xx:54014:null (DIGEST-MD5: IO error acquiring password) with true cause: (token (token for end-user1: HDFS_DELEGATION_TOKEN owner=end-user1, renewer=end-user1, realUser=, issueDate=1705338205996, maxDate=1705943005996, sequenceNumber=277018178, masterKeyId=2195) can't be found in cache) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org