Chenyu Zheng created HDFS-17516: ----------------------------------- Summary: Erasure Coding: Some reconstruction blocks and metrics are inaccuracy when decommission DN which contains many EC blocks. Key: HDFS-17516 URL: https://issues.apache.org/jira/browse/HDFS-17516 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chenyu Zheng Assignee: Chenyu Zheng Attachments: 截屏2024-05-09 下午3.59.22.png, 截屏2024-05-09 下午3.59.44.png
When decommission DN which contains many EC blocks, this DN will mark as busy by scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not generate any block to ecBlocksToBeReplicated. Although no DNA_TRANSFER BlockCommand will be generated for this block, pendingReconstruction and neededReconstruction are still updated, and blockmanager mistakenly believes that the block is being copied. The periodic increases of Metrics `fs_namesystem_num_timed_out_pending_reconstructions` and `fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks are not actually copied. These blocks are re-added to neededReconstruction until they time out. !截屏2024-05-09 下午3.59.44.png!!截屏2024-05-09 下午3.59.22.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org