[ https://issues.apache.org/jira/browse/HDFS-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephen O'Donnell resolved HDFS-15131. -------------------------------------- Resolution: Won't Fix HDFS-13671 is now committed which removes the FoldedTreeSet from the code base, hence closing this as it is no longer relevant. > FoldedTreeSet appears to degrade over time > ------------------------------------------ > > Key: HDFS-15131 > URL: https://issues.apache.org/jira/browse/HDFS-15131 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Affects Versions: 3.3.0 > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Major > > We have seen some occurrences of the Namenode getting very slow on delete > operations, to the point where IBRs get blocked frequently and files fail to > close. On one cluster in particular, after about 4 weeks of uptime, the > Namenode started responding very poorly. Restarting it corrected the problem > for another 4 weeks. > In that example, jstacks in the namenode always pointed to slow operations > around a HDFS delete call which was performing an operation on the > FoldedTreeSet structure. The captured jstacks always pointed at an operation > on the folded tree set each time they were sampled: > {code} > "IPC Server handler 573 on 8020" #663 daemon prio=5 os_prio=0 > tid=0x00007fe6a4087800 nid=0x97a6 runnable [0x00007fe67bdfd000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:879) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:263) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3676) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3507) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4158) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4132) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4069) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4053) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:845) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:308) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:603) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > {code} > The observation in this case, was that the namenode worked fine after a > restart and then at some point after about 4 weeks of uptime, this problem > started happening, and it would persist until the namenode was restarted. > Then the problem did not return for about another 4 weeks. > On a completely different cluster and version, I recently came across a > problem where files were again failing to close (last block does not have > sufficient number of replicas) and the datanodes were logging a lot of > messages like the following: > {code} > 2019-11-27 09:00:49,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 21540ms to process 1 commands from NN > {code} > These messages had a range of durations and were fairly frequent. Focusing on > the longer messages at around 20 seconds and checking a few different > occurrences, jstacks showed as very similar problem to the NN issue, in that > the folderTreeSet seems to be at the root of the problem. > The heartbeat thread is waiting on a lock: > {code} > "Thread-26" #243 daemon prio=5 os_prio=0 tid=0x00007f575225a000 nid=0x1d8bae > waiting for monitor entry [0x00007f571ed76000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2004) > - waiting to lock <0x00000006b79baf00> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:699) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:645) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:730) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:696) > at java.lang.Thread.run(Thread.java:748) > {code} > What seemed to be blocking it is: > {code} > "DataXceiver for client DFSClient_NONMAPREDUCE_672634602_52 at > /10.71.224.51:37620 [Sending block > BP-551684957-10.71.224.13-1505987946791:blk_1683565334_609907893]" #142018893 > daemon prio=5 os_prio=0 tid=0x00007f573cd25000 nid=0x1912a runnable > [0x00007f570acea000] > java.lang.Thread.State: RUNNABLE > at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) > - locked <0x00000006b79baf00> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFile(FsDatasetImpl.java:2199) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1954) > - locked <0x00000006b79baf00> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:709) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:702) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:220) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:301) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:748) > {code} > Out of curiosity, we restarted 3 of the DNs, and 24 hours since the restart, > no messages like the above were logged, but they were getting logged at a > reasonable frequency before the restart. > I have quite a few jstacks at 5 second frequency for some of the DNs on this > cluster and many samples have a runnable thread with this line at top of the > stack, which seems too frequent for a get operation: > {code} > at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111) > - locked <0x00000006b79baf00> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > {code} > After the restart, the jstacks never show this line in any samples for at > least the next week while were were monitoring the cluster. > Another interesting point, is that the JStack always shows line 449. This is > the method it is running and the line is highlighted: > {code} > public E get(Object obj, Comparator<?> cmp) { > Objects.requireNonNull(obj); > Node<E> node = root; > while (node != null) { > E[] entries = node.entries; > int leftIndex = node.leftIndex; > int result = compare(obj, entries[leftIndex], cmp); > if (result < 0) { > node = node.left; > } else if (result == 0) { > return entries[leftIndex]; > } else { > int rightIndex = node.rightIndex; > if (leftIndex != rightIndex) { > result = compare(obj, entries[rightIndex], cmp); > } > if (result == 0) { > return entries[rightIndex]; > } else if (result > 0) { > node = node.right; > } else { > int low = leftIndex + 1; > int high = rightIndex - 1; > while (low <= high) { > int mid = (low + high) >>> 1; > result = compare(obj, entries[mid], cmp); > if (result > 0) { > low = mid + 1; > } else if (result < 0) { > high = mid - 1; > } else { > return entries[mid]; > } > } > return null; > } > } > } // *** This is line 449 which the jstack always has at the top of the > stack. > return null; > } > {code} > I cannot explain why that line is always the "running line" except if its > somehow related to safepoints in the JVM. > Based on these two examples and a few other similar issues I have encountered > in the namenode, I believe we have an issue with foldedTreeSet where it > somehow degrades over time, to the point that a simple get operation takes a > relatively long time. It does not seem to be able to recover from this > without a process restart. > HDFS-13671 talks about reverting FoldedTreeSet (at least in the Namenode) for > performance reasons. However I wonder if the problems people have seen with > it are related to this performance degradation over time rather than the > initial performance? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org