[ 
https://issues.apache.org/jira/browse/HDFS-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-15131.
--------------------------------------
    Resolution: Won't Fix

HDFS-13671 is now committed which removes the FoldedTreeSet from the code base, 
hence closing this as it is no longer relevant.

> FoldedTreeSet appears to degrade over time
> ------------------------------------------
>
>                 Key: HDFS-15131
>                 URL: https://issues.apache.org/jira/browse/HDFS-15131
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> We have seen some occurrences of the Namenode getting very slow on delete 
> operations, to the point where IBRs get blocked frequently and files fail to 
> close. On one cluster in particular, after about 4 weeks of uptime, the 
> Namenode started responding very poorly. Restarting it corrected the problem 
> for another 4 weeks. 
> In that example, jstacks in the namenode always pointed to slow operations 
> around a HDFS delete call which was performing an operation on the 
> FoldedTreeSet structure. The captured jstacks always pointed at an operation 
> on the folded tree set each time they were sampled:
> {code}
> "IPC Server handler 573 on 8020" #663 daemon prio=5 os_prio=0 
> tid=0x00007fe6a4087800 nid=0x97a6 runnable [0x00007fe67bdfd000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:879)
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:263)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3676)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3507)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4158)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4132)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4069)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4053)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:845)
>       at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:308)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:603)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
> {code}
> The observation in this case, was that the namenode worked fine after a 
> restart and then at some point after about 4 weeks of uptime, this problem 
> started happening, and it would persist until the namenode was restarted. 
> Then the problem did not return for about another 4 weeks.
> On a completely different cluster and version, I recently came across a 
> problem where files were again failing to close (last block does not have 
> sufficient number of replicas) and the datanodes were logging a lot of 
> messages like the following:
> {code}
> 2019-11-27 09:00:49,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 21540ms to process 1 commands from NN
> {code}
> These messages had a range of durations and were fairly frequent. Focusing on 
> the longer messages at around 20 seconds and checking a few different 
> occurrences, jstacks showed as very similar problem to the NN issue, in that 
> the folderTreeSet seems to be at the root of the problem.
> The heartbeat thread is waiting on a lock:
> {code}
> "Thread-26" #243 daemon prio=5 os_prio=0 tid=0x00007f575225a000 nid=0x1d8bae 
> waiting for monitor entry [0x00007f571ed76000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2004)
>     - waiting to lock <0x00000006b79baf00> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:699)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:645)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:730)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:539)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:696)
>     at java.lang.Thread.run(Thread.java:748)
> {code}
> What seemed to be blocking it is:
> {code}
> "DataXceiver for client DFSClient_NONMAPREDUCE_672634602_52 at 
> /10.71.224.51:37620 [Sending block 
> BP-551684957-10.71.224.13-1505987946791:blk_1683565334_609907893]" #142018893 
> daemon prio=5 os_prio=0 tid=0x00007f573cd25000 nid=0x1912a runnable 
> [0x00007f570acea000]
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
>     - locked <0x00000006b79baf00> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFile(FsDatasetImpl.java:2199)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1954)
>     - locked <0x00000006b79baf00> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:709)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:702)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:220)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:301)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
>     at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
>     at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
>     at java.lang.Thread.run(Thread.java:748)
> {code}
> Out of curiosity, we restarted 3 of the DNs, and 24 hours since the restart, 
> no messages like the above were logged, but they were getting logged at a 
> reasonable frequency before the restart.
> I have quite a few jstacks at 5 second frequency for some of the DNs on this 
> cluster and many samples have a runnable thread with this line at top of the 
> stack, which seems too frequent for a get operation:
> {code}
> at org.apache.hadoop.hdfs.util.FoldedTreeSet.get(FoldedTreeSet.java:449)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap.get(ReplicaMap.java:111)
> - locked <0x00000006b79baf00> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> {code}
> After the restart, the jstacks never show this line in any samples for at 
> least the next week while were were monitoring the cluster.
> Another interesting point, is that the JStack always shows line 449. This is 
> the method it is running and the line is highlighted:
> {code}
>   public E get(Object obj, Comparator<?> cmp) {
>     Objects.requireNonNull(obj);
>     Node<E> node = root;
>     while (node != null) {
>       E[] entries = node.entries;
>       int leftIndex = node.leftIndex;
>       int result = compare(obj, entries[leftIndex], cmp);
>       if (result < 0) {
>         node = node.left;
>       } else if (result == 0) {
>         return entries[leftIndex];
>       } else {
>         int rightIndex = node.rightIndex;
>         if (leftIndex != rightIndex) {
>           result = compare(obj, entries[rightIndex], cmp);
>         }
>         if (result == 0) {
>           return entries[rightIndex];
>         } else if (result > 0) {
>           node = node.right;
>         } else {
>           int low = leftIndex + 1;
>           int high = rightIndex - 1;
>           while (low <= high) {
>             int mid = (low + high) >>> 1;
>             result = compare(obj, entries[mid], cmp);
>             if (result > 0) {
>               low = mid + 1;
>             } else if (result < 0) {
>               high = mid - 1;
>             } else {
>               return entries[mid];
>             }
>           }
>           return null;
>         }
>       }
>     } // *** This is line 449 which the jstack always has at the top of the 
> stack.
>     return null;
>   }
> {code}
> I cannot explain why that line is always the "running line" except if its 
> somehow related to safepoints in the JVM.
> Based on these two examples and a few other similar issues I have encountered 
> in the namenode, I believe we have an issue with foldedTreeSet where it 
> somehow degrades over time, to the point that a simple get operation takes a 
> relatively long time. It does not seem to be able to recover from this 
> without a process restart.
> HDFS-13671 talks about reverting FoldedTreeSet (at least in the Namenode) for 
> performance reasons. However I wonder if the problems people have seen with 
> it are related to this performance degradation over time rather than the 
> initial performance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to