Shuyan Zhang created HDFS-17227: ----------------------------------- Summary: EC: Fix bug in choosing targets when racks is not enough. Key: HDFS-17227 URL: https://issues.apache.org/jira/browse/HDFS-17227 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang
*Bug description* If, 1. There is a striped block blockinfo1, which has an excess replica on datanodeA. 2. blockinfo1 has an internal block that needs to be reconstruction. 3. The number of racks is less than the number of internal blocks of Blockinfo1. Then, NN may choose datanodeA to reconstruct the internal block, resulting in two internal blocks of blockinfo1 on datanodeA, causing confusion. *Root cause and solution* When we use `BlockPlacementPolicyRackFaultTolerant` for choosing targets and the racks is insufficient, `chooseEvenlyFromRemainingRacks` will be called. Currently, `chooseEvenlyFromRemainingRacks` calls `chooseOnce`, `chooseOnce` use `newExcludeNodes` as parameter instead of `excludedNodes`. When we choose targets for reconstructing internal blocks, 'newExcludeNodes' only includes those datanodes that contain live replicas, and does not include datanodes that have excess replicas. This may result in datanodes with excess replicas is chosen. I don't think we need to use 'newExcludeNodes', just pass `excludedNodes` to `chooseOnce`. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org