Hi,

When allocate a datanode when dfsclient write with considering the load, it
checks if the datanode is overloaded by calculating the average xceivers of
all the in service datanode. But if the datanode is decommissioned and
become dead, it's still treated as in service, which make the average load
much more than the real one especially when the number of the
decommissioned datanode is great. In our cluster, 180 datanode, and 100 of
them decommissioned, and the average load is 17. This failed all the
datanode allocation.


I have created a jira HDFS-12820. But not very sure the solution. My
concern is why we don't exclude the decommissioned datanode from the
nodesInService originally? If we simply remove this, could there be any
side effect?

private void subtract(final DatanodeDescriptor node) {
capacityUsed -= node.getDfsUsed();
blockPoolUsed -= node.getBlockPoolUsed();
xceiverCount -= node.getXceiverCount();
if (!(node.isDecommissionInProgress() || node.isDecommissioned())) {
nodesInService--;
nodesInServiceXceiverCount -= node.getXceiverCount();
capacityTotal -= node.getCapacity();
capacityRemaining -= node.getRemaining();
} else
{ capacityTotal -= node.getDfsUsed(); }

cacheCapacity -= node.getCacheCapacity();
cacheUsed -= node.getCacheUsed();
}

-- 
Xie Gang

Reply via email to