Wei-Chiu Chuang created HDFS-11486:
--------------------------------------

             Summary: Client close() should not fail fast if the last block is 
being decommissioned
                 Key: HDFS-11486
                 URL: https://issues.apache.org/jira/browse/HDFS-11486
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Wei-Chiu Chuang
            Assignee: Wei-Chiu Chuang


If a DFS client closes a file while the last block is being decommissioned, the 
close() may fail if the decommission of the block does not complete in a few 
seconds.

When a DataNode is being decommissioned, NameNode marks the DN's state as 
DECOMMISSION_INPROGRESS_INPROGRESS, and blocks with replicas on these DataNodes 
become under-replicated immediately. A close() call which attempts to complete 
the last open block will fail if the number of live replicas is below minimal 
replicated factor, due to too many replicas residing on the DataNodes.

The client internally will try to complete the last open block for up to 5 
times by default, which is roughly 12 seconds. After that, close() throws an 
exception like the following, which is typically not handled properly.

{noformat}
java.io.IOException: Unable to close file because the last 
blockBP-33575088-10.0.0.200-1488410554081:blk_1073741827_1003 does not have 
enough number of replicas.

        at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:864)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:827)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:793)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
        at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
        at 
org.apache.hadoop.hdfs.TestDecommission.testCloseWhileDecommission(TestDecommission.java:708)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

{noformat}
Once the exception is thrown, the client usually does not attempt to close 
again, so the file remains in open state, and the last block remains in under 
replicated state.

Subsequently, administrator runs recoverLease tool to salvage the file, but the 
attempt failed because the block remains in under replicated state. It is not 
clear why the block is never replicated though.

In summary, I do not think close() should fail because the last block is being 
decommissioned. The block has sufficient number replicas, and it's just that 
some replicas are being decommissioned. Decomm should be transparent to clients.

This issue seems to be more prominent on a very large scale cluster, with min 
replication factor set to 2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to