Kevin Wikant created HDFS-16064:
-----------------------------------

             Summary: HDFS-721 causes DataNode decommissioning to get stuck 
indefinitely
                 Key: HDFS-16064
                 URL: https://issues.apache.org/jira/browse/HDFS-16064
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode, namenode
    Affects Versions: 3.2.1
            Reporter: Kevin Wikant


Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
non-issue under the assumption that if the namenode & a datanode get into an 
inconsistent state for a given block pipeline, there should be another datanode 
available to replicate the block to

While testing datanode decommissioning using "dfs.exclude.hosts", I have 
encountered a scenario where the decommissioning gets stuck indefinitely

Below is the progression of events:
 * there are initially 4 datanodes DN1, DN2, DN3, DN4
 * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
 * HDFS block(s) on DN1 & DN2 must now be replicated to DN3 & DN4 in order to 
satisfy their minimum replication factor of 2
 * during this replication process 
https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes the 
following inconsistent state:
 ** DN3 thinks it has the block pipeline in FINALIZED state
 ** the namenode does not think DN3 has the block pipeline

{code:java}
2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
(DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
dst: /DN3:9866; 
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
{code}
 * the replication is attempted again, but:
 ** DN4 has the block
 ** DN1 and/or DN2 have the block, but don't count towards the minimum 
replication factor because they are being decommissioned
 ** DN3 does not have the block & cannot have the block replicated to it 
because of HDFS-721
 * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
fails, this continues indefinitely
 * therefore DN4 is the only live datanode with the block & the minimum 
replication factor of 2 cannot be satisfied
 * because the minimum replication factor cannot be satisfied for the block(s) 
being moved off DN1 & DN2, the datanode decommissioning can never be completed

 
{code:java}
2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): Block: 
blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
decommissioned replicas: 0, decommissioning replicas: 2, maintenance replicas: 
0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: 
false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , Current 
Datanode: DN1:9866, Is current datanode decommissioning: true, Is current 
datanode entering maintenance: false
...
2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): Block: 
blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
decommissioned replicas: 0, decommissioning replicas: 2, maintenance replicas: 
0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: 
false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , Current 
Datanode: DN2:9866, Is current datanode decommissioning: true, Is current 
datanode entering maintenance: false
{code}
Being stuck in decommissioning state forever is not an intended behavior of 
DataNode decommissioning

A few potential solutions:
 * Address the root cause of the problem which is an inconsistent state between 
namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
 * Detect when datanode decommissioning is stuck due to lack of available 
datanodes for satisfying the minimum replication factor, then recover by 
re-enabling the datanodes being decommissioned

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to