Stephen O'Donnell created HDFS-14626:
----------------------------------------

             Summary: Decommission all nodes hosting last block of open file 
succeeds unexpectedly 
                 Key: HDFS-14626
                 URL: https://issues.apache.org/jira/browse/HDFS-14626
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 3.3.0
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


I have been investigating scenarios that cause decommission to hang, especially 
around one long standing issue. That is, an open block on the host which is 
being decommissioned can cause the process to never complete.

Checking the history, there seems to have been at least one change in HDFS-5579 
which greatly improved the situation, but from reading comments and support 
cases, there still seems to be some scenarios where open blocks on a DN host 
cause the decommission to get stuck.

No matter what I try, I have not been able to reproduce this, but I think I 
have uncovered another issue that may partly explain why.

If I do the following, the nodes will decommission without any issues:

1. Create a file and write to it so it crosses a block boundary. Then there is 
one complete block and one under construction block. Keep the file open, and 
write a few bytes periodically.

2. Now note the nodes which the UC block is currently being written on, and 
decommission them all.

3. The decommission should succeed.

4. Now attempt to close the open file, and it will fail to close with an error 
like below, probably as decommissioned nodes are not allowed to send IBRs:

{code}
java.io.IOException: Unable to close file because the last block 
BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have 
enough number of replicas.
    at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
    at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
    at 
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
    at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
    at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
    \{code}

Interestingly, if you recommission the nodes without restarting them before 
closing the file, it will close OK, and writes to it can continue even once 
decommission has completed.

I don't think this is expected - ie decommission should not complete on all 
nodes hosting the last UC block of a file?

>From what I have figured out, I don't think UC blocks are considered in the 
>DatanodeAdminManager at all. This is because the original list of blocks it 
>cares about, are taken from the Datanode block Iterator, which takes them from 
>the DatanodeStorageInfo objects attached to the datanode instance. I believe 
>UC blocks don't make it into the DatanodeStoreageInfo until after they have 
>been completed and an IBR sent, so the decommission logic never considers them.

What troubles me about this explanation, is how did open files previously cause 
decommission to get stuck if it never checks for them, so I suspect I am 
missing something.

I will attach a patch with a test case that demonstrates this issue. This 
reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 
branch, but with a lot of backports.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to