Andrew Wang created HDFS-12567:
----------------------------------

             Summary: BlockPlacementPolicyRackFaultTolerant fails with racks 
with very few nodes
                 Key: HDFS-12567
                 URL: https://issues.apache.org/jira/browse/HDFS-12567
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: erasure-coding
    Affects Versions: 3.0.0-alpha1
            Reporter: Andrew Wang


Found this while doing some testing on an internal cluster with an unusual 
setup. We have a rack with ~20 nodes, then a few more with just a few nodes. It 
would fail to get (# data blocks) datanodes even though there were plenty of 
DNs on the rack with 20 DNs.

I managed to reproduce this same issue in a unit test, stack trace like this:

{noformat}
java.io.IOException: File /testfile0 could only be written to 5 of the 6 
required nodes for RS-6-3-1024k. There are 9 datanode(s) running and no node(s) 
are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2083)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2609)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:863)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:548)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
{noformat}

This isn't a very critical bug since it's an unusual rack configuration, but it 
can easily happen during testing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to