[ https://issues.apache.org/jira/browse/HADOOP-19052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shilun Fan resolved HADOOP-19052. --------------------------------- Fix Version/s: 3.5.0 3.4.1 Hadoop Flags: Reviewed Target Version/s: 3.5.0, 3.4.1 Resolution: Fixed > Hadoop use Shell command to get the count of the hard link which takes a lot > of time > ------------------------------------------------------------------------------------ > > Key: HADOOP-19052 > URL: https://issues.apache.org/jira/browse/HADOOP-19052 > Project: Hadoop Common > Issue Type: Improvement > Environment: Hadopp 3.3.4 > Reporter: liang yu > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > Attachments: debuglog.png > > > Using Hadoop 3.3.4 > > When the QPS of `append` executions is very high, at a rate of above 10000/s. > > We found that the write speed in hadoop is very slow. We traced some > datanodes' log and find that there is a warning : > {code:java} > 2024-01-26 11:09:44,292 WARN impl.FsDatasetImpl > (InstrumentedLock.java:logwaitWarning(165)) Waited above threshold(300 ms) to > acquire lock: lock identifier: FsDatasetRwlock waitTimeMs=336 ms.Suppressed 0 > lock wait warnings.Longest supressed waitTimeMs=0.The stack trace is > java.lang.Thread,getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1060) > org.apache.hadoop.util.Instrumentedlock.logWaitWarning(InstrumentedLock.java:171) > org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:222) > org.apache.hadoop.util.InstrumentedLock.lock(InstrumentedLock, iaya:105) > org.apache.hadoop.util.AutocloseableLock.acquire(AutocloseableLock.java:67) > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.java:1239) > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:230) > org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver > (DataXceiver.java:1313) > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock > (DataXceiver.java:764) > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176) > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110) > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:293) > java.lang.Thread.run(Thread.java:748) > {code} > > Then we traced the method > _org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl. > java:1239),_ and print how long each command take to finish the execution, > and find that it takes us 700ms to get the linkCount of the file which is > really slow. > !debuglog.png! > > We traced the code and find that java1.8 use a Shell Command to get the > linkCount, in which execution it will start a new Process and wait for the > Process to fork, when the QPS is very high, it will sometimes take a long > time to fork the process. > Here is the shell command. > {code:java} > stat -c%h /path/to/file > {code} > > Solution: > For the FileStore that supports the file attributes "unix", we can use the > method _Files.getAttribute(f.toPath(), "unix:nlink")_ to get the linkCount, > this method doesn't need to start a new process, and will return the result > in a very short time. > > When we use this method to get the file linkCount, we rarely get the WARN log > above when the QPS of append execution is high. > . > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org