Hi all,
We are having difficulty writing any logs to a HDFS cluster of less than
3 nodes. This has been since the update between cdh4.2 and 4.3 (4.4 is
also the same). Has anything changed that may make this occur and is
there anything that can be done to rectify the situation, so we can use
a single datanode once more?
The error log contains errors about "lease recovery" and "Failed to add
a datanode".
Here is an example stack trace:
java.io.IOException: Failed to add a datanode. User may turn off this feature
by setting dfs.client.block.write.replace-datanode-on-failure.policy in
configuration, where the current policy is DEFAULT. (Nodes:
current=[5.9.130.139:50010, 5.9.130.140:50010], original=[5.9.130.139:50010,
5.9.130.140:50010])
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
FSDataOutputStream#close error:
java.io.IOException: Failed to add a datanode. User may turn off this feature
by setting dfs.client.block.write.replace-datanode-on-failure.policy in
configuration, where the current policy is DEFAULT. (Nodes:
current=[5.9.130.139:50010, 5.9.130.140:50010], original=[5.9.130.139:50010,
5.9.130.140:50010])
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:816)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:876)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:493)
hdfsOpenFile(hdfs://storage1.testing.swiftserve.com:9000/scribe/logs/test/log1.testing.swiftserve.com/test-2013-10-14_00000):
FileSystem#append((Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FSDataOutputStream;)
error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
Failed to create file
[/scribe/logs/test/log1.testing.swiftserve.com/test-2013-10-14_00000] for
[DFSClient_NONMAPREDUCE_1056562813_1] on client [5.9.130.136], because this
file is already being created by [DFSClient_NONMAPREDUCE_2007800327_1] on
[5.9.130.136]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2062)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1862)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2105)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2081)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:434)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:224)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44944)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
at org.apache.hadoop.ipc.Client.call(Client.java:1231)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.append(Unknown Source)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.append(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:210)
at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1352)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1391)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1379)
at
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:257)
at
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:81)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1106)
The exception suggests we have already set "
dfs.client.block.write.replace-datanode-on-failure.policy" to "NEVER"
but hadoop ignores it.
Any help would be appreciated.
Thanks,
David Mankellow