Goodness Ayinmode created HDFS-17639:
----------------------------------------

             Summary: Lock contention for hasStorageType when the number of 
storage nodes is large
                 Key: HDFS-17639
                 URL: https://issues.apache.org/jira/browse/HDFS-17639
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: datanode, server
    Affects Versions: 3.4.0
            Reporter: Goodness Ayinmode


Lock contention and for hasStorageType when the number of storage nodes is large

 

Hi,

I was looking into methods associated with storages and storageTypes. I found 
[DatanodeDescriptor.hasStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L1138]
 could be a source of  bottlenecks. To check whether a specific storage type 
exists among the storage locations associated with a DatanodeDescriptor, 
[hasStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L1138]
 iterates over an array of DatanodeStorageInfos returned by 
[getStorageInfos()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L305].
 This retrieves the storage information from a storageMap and converts it to an 
array while under a lock. As the system scales and the size of storageMap grows 
with more datanodes, the duration spent in the synchronized block will 
increase. This issue could become more significant when hasStorageType is 
called  in methods like 
[DatanodeDescriptor.pruneStorageMap|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L568]
 that could iterate (resulting in a form of nested iteration) over a large data 
structure. The combination of a repeated linear search (within hasStorageType) 
and the iteration within a lock can lead to a significant complexity 
(potentially quadratic) and significant synchronization bottlenecks

 

[DFSNetworkTopology.chooseRandomWithStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSNetworkTopology.java#L180]
 and [DFSNetworkTopology. 
chooseRandomWithStorageTypeTwoTrial|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSNetworkTopology.java#L107]
 are affected because they both invoke hasStorageType. Additionally, 
[INodeFile.assertAllBlocksComplete|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java#L345]
 and 
[BlockManager.checkRedundancy()|https://github.com/apache/hadoop/blob/6be04633b55bbd67c2875e39977cd9d2308dc1d1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5018]
 faces a similar issue 
([FSNamesystem.finalizeINodeFileUnderConstruction|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3908]
 invokes both methods under a writeLock)


This appears to be a similar issue with 
https://issues.apache.org/jira/browse/HDFS-17638 . I’m curious to know if my 
analysis is wrong and if there is anything that can be done to reduce the 
impact of these issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to