Chenyu Zheng created HDFS-17516:
-----------------------------------

             Summary: Erasure Coding: Some reconstruction blocks and metrics 
are inaccuracy when decommission DN  which contains many EC blocks.
                 Key: HDFS-17516
                 URL: https://issues.apache.org/jira/browse/HDFS-17516
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Chenyu Zheng
            Assignee: Chenyu Zheng
         Attachments: 截屏2024-05-09 下午3.59.22.png, 截屏2024-05-09 下午3.59.44.png

When decommission DN  which contains many EC blocks, this DN will mark as busy 
by 
scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not 
generate any block to ecBlocksToBeReplicated. 
Although no DNA_TRANSFER BlockCommand will be generated for this block, 
pendingReconstruction and neededReconstruction are still updated, and 
blockmanager mistakenly believes that the block is being copied.
The periodic increases of Metrics 
`fs_namesystem_num_timed_out_pending_reconstructions` and 
`fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks 
are not actually copied. These blocks are re-added to neededReconstruction 
until they time out.
!截屏2024-05-09 下午3.59.44.png!!截屏2024-05-09 下午3.59.22.png!
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to