AMC-team created HDFS-15440:
-------------------------------
Summary: The using of dfs.disk.balancer.block.tolerance.percent is
inconsistent with doc
Key: HDFS-15440
URL: https://issues.apache.org/jira/browse/HDFS-15440
Project: Hadoop HDFS
Issue Type: Bug
Components: balancer & mover
Reporter: AMC-team
In HDFS disk balancer, configuration parameter
"dfs.disk.balancer.block.tolerance.percent" is to define a percentage which
defines a good enough move.
The description in hdfs-default.xml is not so clear to me how the value
actually calculates and works
{quote}When a disk balancer copy operation is proceeding, the datanode is still
active. So it might not be possible to move the exactly specified amount of
data. So tolerance allows us to define a percentage which defines a good enough
move.{quote}
So I refer to the [official doc of HDFS disk
balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
and the description is:
bq. The tolerance percent specifies when we have reached a good enough value
for any copy step. For example, if you specify 10 then getting close to 10% of
the target value is good enough. It is to say if the move operation is 20GB in
size, if we can move 18GB (20 * (1-10%)) that operation is considered
successful.
However from the source code in DiskBalancer.java
{code:java}
// Inflates bytesCopied and returns true or false. This allows us to stop
// copying if we have reached close enough.
private boolean isCloseEnough(DiskBalancerWorkItem item) {
long temp = item.getBytesCopied() +
((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
return (item.getBytesToCopy() >= temp) ? false : true;
}
{code}
Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is
still not enough because 20 > 18 + 18*0.1
The calculation in isLessThanNeeded() (Checks if a given block is less than
needed size to meet our goal.) is also not intuitive in the same way.
*How to fix*
Although this may not lead severe failure, but it is better to make it
consistent between doc and code, and also better to refine the description in
hdfs-default.xml to make it more precise and clear.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]