Todd Lipcon created HDFS-4799:
---------------------------------
Summary: Corrupt replica can be prematurely removed from
corruptReplicas map
Key: HDFS-4799
URL: https://issues.apache.org/jira/browse/HDFS-4799
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
We saw the following sequence of events in a cluster result in losing the most
recent genstamp of a block:
- client is writing to a pipeline of 3
- the pipeline had nodes fail over some period of time, such that it left 3
old-genstamp replicas on the original three nodes, having recruited 3 new
replicas with a later genstamp.
-- so, we have 6 total replicas in the cluster, three with old genstamps on
downed nodes, and 3 with the latest genstamp
- cluster reboots, and the nodes with old genstamps blockReport first. The
replicas are correctly added to the corrupt replicas map since they have a
too-old genstamp
- the nodes with the new genstamp block report. When the latest one block
reports, chooseExcessReplicates is called and incorrectly decides to remove the
three good replicas, leaving only the old-genstamp replicas.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira