Goodness Ayinmode created HDFS-17639: ----------------------------------------
Summary: Lock contention for hasStorageType when the number of storage nodes is large Key: HDFS-17639 URL: https://issues.apache.org/jira/browse/HDFS-17639 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, server Affects Versions: 3.4.0 Reporter: Goodness Ayinmode Lock contention and for hasStorageType when the number of storage nodes is large Hi, I was looking into methods associated with storages and storageTypes. I found [DatanodeDescriptor.hasStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L1138] could be a source of bottlenecks. To check whether a specific storage type exists among the storage locations associated with a DatanodeDescriptor, [hasStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L1138] iterates over an array of DatanodeStorageInfos returned by [getStorageInfos()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L305]. This retrieves the storage information from a storageMap and converts it to an array while under a lock. As the system scales and the size of storageMap grows with more datanodes, the duration spent in the synchronized block will increase. This issue could become more significant when hasStorageType is called in methods like [DatanodeDescriptor.pruneStorageMap|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L568] that could iterate (resulting in a form of nested iteration) over a large data structure. The combination of a repeated linear search (within hasStorageType) and the iteration within a lock can lead to a significant complexity (potentially quadratic) and significant synchronization bottlenecks [DFSNetworkTopology.chooseRandomWithStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSNetworkTopology.java#L180] and [DFSNetworkTopology. chooseRandomWithStorageTypeTwoTrial|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSNetworkTopology.java#L107] are affected because they both invoke hasStorageType. Additionally, [INodeFile.assertAllBlocksComplete|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java#L345] and [BlockManager.checkRedundancy()|https://github.com/apache/hadoop/blob/6be04633b55bbd67c2875e39977cd9d2308dc1d1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5018] faces a similar issue ([FSNamesystem.finalizeINodeFileUnderConstruction|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3908] invokes both methods under a writeLock) This appears to be a similar issue with https://issues.apache.org/jira/browse/HDFS-17638 . I’m curious to know if my analysis is wrong and if there is anything that can be done to reduce the impact of these issues -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org