Goodness Ayinmode created HDFS-17638: ----------------------------------------
Summary: Lock contention for DatanodeStorageInfo when the number of storage nodes is large Key: HDFS-17638 URL: https://issues.apache.org/jira/browse/HDFS-17638 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, server Affects Versions: 3.4.0 Reporter: Goodness Ayinmode Hi, I was looking into the DatanodeStorageInfo class and I think some of the methods could give issues at large scale. For example, to convert DatanodeStorageInfo objects into their respective DatanodeDescriptor and Storage ID forms, [DatanodeStorageInfo.toDatanodeInfos()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L44] and [DatanodeStorageInfo.toStorageIDs()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L61] iterate over the entire array of storage nodes. Each operation is linear, however performance issues can arise, when they are called under a lock, like in [bumpBlockGenerationStamp|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L5987], where [newLocatedBlock|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5461] calls both methods (bumpBlockGenerationStamp --> newLocatedBlock --> [newLocatedBlock|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5437] or [newLocatedStripedBlock|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5435] --> toDatanodeInfos and toStorageIDs) under the writeLock. This situation can be even more problematic when these methods are repeatedly invoked within an iteration like in [createLocatedBlockList|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1450] ([createLocatedBlocks|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1601] --> createLocatedBlockList --> [createLocatedBlock |https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1487]--> newLocatedBlock or newLocatedStripedBlock --> toDatanodeInfos and toStorageIDs). Such behaviors cause significant synchronization bottlenecks when the number of blocks or number of storage nodes is large. [BlockPlacementPolicyDefault.getPipeline|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1147], [BlockPlacementPolicyDefault.chooseTarget|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L287], and [BlockManager.validateReconstructionWork|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2355] ( [BlockManager.computeReconstructionWorkForBlocks|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2187] --> BlockManager.validateReconstructionWork --> [incrementBlocksScheduled|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L338] ) also faces a similar issue with lock contention. Please let me know if my analysis is wrong, and if there are suggestions to make this better. Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org