Jing Zhao created HDFS-9837:
-------------------------------
Summary: BlockManager#countNodes should be able to detect
duplicated internal blocks
Key: HDFS-9837
URL: https://issues.apache.org/jira/browse/HDFS-9837
Project: Hadoop HDFS
Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Currently {{BlockManager#countNodes}} only counts the number of
replicas/internal blocks thus it cannot detect the under-replicated scenario
where a striped EC block has 9 internal blocks but contains duplicated
data/parity blocks. E.g., b8 is missing while 2 b0 exist:
b0, b1, b2, b3, b4, b5, b6, b7, b0
If the NameNode keeps running, NN is able to detect the duplication of b0 and
will put the block into the excess map. {{countNodes}} excludes internal blocks
captured in the excess map thus can return the correct number of live replicas.
However, if NN restarts before sending out the reconstruction command, the
missing internal block cannot be detected anymore. The following steps can
reproduce the issue:
# create an EC file
# kill DN1 and wait for the reconstruction to happen
# start DN1 again
# kill DN2 and restart NN immediately
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)