[ https://issues.apache.org/jira/browse/HDFS-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wang resolved HDFS-6626. ------------------------------- Resolution: Not a Problem Thanks for checking in on this Ming. Since it seems like the dead node list is sufficient, let's close this JIRA out. Please reopen if a usecase reemerges. > Node is marked decommissioned if it becomes dead when it is being > decommissioned > -------------------------------------------------------------------------------- > > Key: HDFS-6626 > URL: https://issues.apache.org/jira/browse/HDFS-6626 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Ming Ma > > Not sure if it is by design. But it isn't intuitive. The scenario is like > this, you try to decommission a node; when the node is being decommissioned, > the node becomes dead from NN's point of view; right after that NN will mark > this node decommissioned. On the webUI, administrators will consider the > decommission has completed successfully. That is because when there is no > block left for the DN, decommission is considered done. > {noformat} > BlockManager.java > boolean isReplicationInProgress(DatanodeDescriptor srcNode) { > boolean status = false; > ... > final Iterator<? extends Block> it = srcNode.getBlockIterator(); > while(it.hasNext()) { > ... > // set status if there is block under replication > } > ... > return status; > } > {noformat} > The question is whether we should mark the dead node as decommission > completed (the current behavior), or mark the dead node "decommission > aborted". From administrators' point of view, when they are doing decomm, > they want to know the status of decomm and the health of those > decomm-in-progress nodes. If they can detect decommission failure earlier, > they might be able to take actions earlier; for example if the TOR switch has > issues during decomm, administrators will be able to quickly find out a bunch > of "decommission aborted" nodes from the same rack. People can still find > this information by doing the join between decomm node list and recent dead > node list on the webUI; just not as convenient. > Suggestions? -- This message was sent by Atlassian JIRA (v6.2#6252)