Fei Hui created HDFS-14920:
------------------------------

             Summary: Erasure Coding: Decommission may hang If one or more 
datanodes are out of service during decommission  
                 Key: HDFS-14920
                 URL: https://issues.apache.org/jira/browse/HDFS-14920
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: ec
    Affects Versions: 3.1.3, 3.2.1, 3.0.3
            Reporter: Fei Hui


Decommission test hangs in our clusters.
Have seen the messages as follow
{quote}
2019-10-22 15:58:51,514 TRACE 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block 
blk_-9223372035600425840_372987973 numExpected=9, numLive=5
2019-10-22 15:58:51,514 INFO BlockStateChange: Block: 
blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, 
corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, 
maintenance replicas: 0, live entering maintenance replicas: 0, excess 
replicas: 0, Is Open File: false, Datanodes having this block: 
10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 
10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 
10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current datanode 
decommissioning: true, Is current datanode entering maintenance: false
2019-10-22 15:58:51,514 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node 
10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate to 
finish Decommission In Progress
{quote}

After digging the source code and cluster log,  guess it happens as follow 
steps.
# Storage strategy is RS-6-3-1024k.
# EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from 
datanode dn0, b1 is from datanode dn1, ...etc
# At the beginning dn0 is in decommission progress, b0 is replicated 
successfully, and dn0 is staill in decommission progress.
# Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of 
service, so need to reconstruct, and create ErasureCodingWork to do it, in the 
ErasureCodingWork, additionalReplRequired is 4
# Because hasAllInternalBlocks is false, Will call 
ErasureCodingWork#addTaskToDatanode -> 
DatanodeDescriptor#addBlockToBeErasureCoded, and send BlockECReconstructionInfo 
task to Datanode
# DataNode can not reconstruction the block because targets is 4, greater than 
3( parity number).

There is a problem as follow, from BlockManager.java
{code}
      // should reconstruct all the internal blocks before scheduling
      // replication task for decommissioning node(s).
      if (additionalReplRequired - numReplicas.decommissioning() -
          numReplicas.liveEnteringMaintenanceReplicas() > 0) {
        additionalReplRequired = additionalReplRequired -
            numReplicas.decommissioning() -
            numReplicas.liveEnteringMaintenanceReplicas();
      }
{code}
Should reconstruction firstly and then replicate for decommissioning. Because 
numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's 
wrong.
numReplicas.decommissioning() should be 3, it should exclude live replica. If 
so, additionalReplRequired will be 1, reconstruction will schedule as expected. 
After that, decommission goes on.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to