Error (failed to close file) when uploading large file + kill active NN + 
manual failover
-----------------------------------------------------------------------------------------

                 Key: HDFS-3031
                 URL: https://issues.apache.org/jira/browse/HDFS-3031
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: ha
    Affects Versions: HA branch (HDFS-1623)
            Reporter: Stephen Chu
             Fix For: HA branch (HDFS-1623)
         Attachments: styx01_killNNfailover, styx01_uploadLargeFile

I executed section 3.4 of Todd's HA test plan. 
https://issues.apache.org/jira/browse/HDFS-1623
1. A large file upload is started.
2. While the file is being uploaded, the administrator kills the first NN and 
performs a failover.
3. After the file finishes being uploaded, it is verified for correct length 
and contents.

For the test, I have a vm_template styx01:/home/schu/centos64-2-5.5.qcow2. 
styx01 hosted the active NN and styx02 hosted the standby NN.

In the log files I attached, you can see that on styx01 I began file upload.
hadoop fs -put centos64-2.5.5.qcow2

After waiting several seconds, I kill -9'd the active NN on styx01 and manually 
failed over to the NN on styx02. I ran into exception below. (rest of the 
stacktrace in the attached file styx01_uploadLargeFile)

12/02/29 14:12:52 WARN retry.RetryInvocationHandler: A failover has occurred 
since the start of this method invocation attempt.
put: Failed on local exception: java.io.EOFException; Host Details : local host 
is: "styx01.sf.cloudera.com/172.29.5.192"; destination host is: 
""styx01.sf.cloudera.com"\
:12020;
12/02/29 14:12:52 ERROR hdfs.DFSClient: Failed to close file 
/user/schu/centos64-2-5.5.qcow2._COPYING_
java.io.IOException: Failed on local exception: java.io.EOFException; Host 
Details : local host is: "styx01.sf.cloudera.com/172.29.5.192"; destination 
host is: ""styx01.\
sf.cloudera.com":12020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
        at org.apache.hadoop.ipc.Client.call(Client.java:1145)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:188)
        at $Proxy9.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:302)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1097)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:973)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:455)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:830)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:762)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to