[jira] [Created] (HDFS-16158) Discover datanodes with unbalanced volume usage by the standard deviation

tomscut (Jira) Tue, 10 Aug 2021 08:15:13 -0700

tomscut created HDFS-16158:
------------------------------

             Summary: Discover datanodes with unbalanced volume usage by the 
standard deviation 
                 Key: HDFS-16158
                 URL: https://issues.apache.org/jira/browse/HDFS-16158
             Project: Hadoop HDFS
          Issue Type: New Feature
            Reporter: tomscut
            Assignee: tomscut



Discover datanodes with unbalanced volume usage by the standard deviation

In some scenarios, we may cause unbalanced datanode disk usage:
1. Repair the damaged disk and make it online again.
2. Add disks to some Datanodes.
3. Some disks are damaged, resulting in slow data writing.
4. Use some custom volume choosing policies.

In the case of unbalanced disk usage, a sudden increase in datanode write 
traffic may result in busy disk I/O with low volume usage, resulting in 
decreased throughput across datanodes.

In this case, we need to find these nodes in time to do diskBalance, or other 
processing. Based on the volume usage of each datanode, we can calculate the 
standard deviation of the volume usage. The more unbalanced the volume, the 
higher the standard deviation.

To prevent the namenode from being too busy, we can calculate the standard 
variance on the datanode side, transmit it to the namenode through heartbeat, 
and display the result on the Web of namenode. We can then sort directly to 
find the nodes on the Web where the volumes usages are unbalanced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDFS-16158) Discover datanodes with unbalanced volume usage by the standard deviation

Reply via email to