Decommissioning nodes not persisted between NameNode restarts
-------------------------------------------------------------
Key: HDFS-1271
URL: https://issues.apache.org/jira/browse/HDFS-1271
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Reporter: Travis Crawford
Datanodes in the process of being decomissioned should still be decomissioning
after namenode restarts. Currently they are marked as dead after a restart.
Details:
Nodes can be safely removed from a cluster by marking them as decomissioned and
waiting for their data to be replicated elsewhere. This is accomplished by
adding a node to the filed referenced by dfs.hosts.excluded, then refreshing
nodes.
Decomissioning means block reports from the decomissioned datanode are no
longer accepted by the namenode, meaning for decomissioning to occur the NN
must have an existing block report. That is, a datanode can transition from:
live --> decomissioning --> dead. Nodes can NOT transition from: dead -->
decomissioning --> dead.
Operationally this is problematic because intervention is required should the
NN restart while nodes are decomissioning, meaning in-house administration
tools must be more complex, or more likely admins have to babysit the
decomissioning process.
Someone more familiar with the code might have a better idea, but perhaps the
first block report for dfs.hosts.excluded hosts should be accepted?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.