Shuyan Zhang created HDFS-17227:
-----------------------------------
Summary: EC: Fix bug in choosing targets when racks is not enough.
Key: HDFS-17227
URL: https://issues.apache.org/jira/browse/HDFS-17227
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Shuyan Zhang
*Bug description*
If,
1. There is a striped block blockinfo1, which has an excess replica on
datanodeA.
2. blockinfo1 has an internal block that needs to be reconstruction.
3. The number of racks is less than the number of internal blocks of Blockinfo1.
Then, NN may choose datanodeA to reconstruct the internal block, resulting in
two internal blocks of blockinfo1 on datanodeA, causing confusion.
*Root cause and solution*
When we use `BlockPlacementPolicyRackFaultTolerant` for choosing targets and
the racks is insufficient, `chooseEvenlyFromRemainingRacks` will be called.
Currently, `chooseEvenlyFromRemainingRacks` calls `chooseOnce`, `chooseOnce`
use `newExcludeNodes` as parameter instead of `excludedNodes`. When we choose
targets for reconstructing internal blocks, 'newExcludeNodes' only includes
those datanodes that contain live replicas, and does not include datanodes that
have excess replicas. This may result in datanodes with excess replicas is
chosen.
I don't think we need to use 'newExcludeNodes', just pass `excludedNodes` to
`chooseOnce`.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]