+1 The code looks good in general. It is great that there are a lot of tests and documentation. Some minor comments which can be addressed after merge:- There are a few TODOs in the code.- Tried the help command "hdfs diskbalancer -help plan". There is a typo "wetolerate" in --thresholdPercentage. Also, we should mention the unit for --bandwidth.- We should avoid using the same class name such as DiskBalancer, which is defined in both the datanode and tools packages. It may be better to call it DiskBalancerCli for the one in tools.- I still think that it is better to use weighted mean and weighted variance in the calculation. Thanks.Tsz-Wo
On Thursday, June 16, 2016 8:38 AM, Anu Engineer <aengin...@hortonworks.com> wrote: Hi All, I would like to propose a merge vote for HDFS-1312 (Disk balancer) branch to trunk. This branch creates a new tool that allows balancing of data on a datanode. The voting commences now and will run for 7 days till Jun/22/2016 5:00 PM PST. This tool distributes data evenly between the disks of same type on a datanode. This is useful if a disk has been replaced or if some disks are out of space compared to rest of the disks. The current set of commands supported are: 1. Plan - Allows user to create a plan and review it. The plan describes how the data will be moved in the data node. 2. Execute - Allows execution of a plan against a datanode. 3. Query – Queries the status of disk balancer execution. 4. Cancel - cancels a running disk balancer plan. 5. Report – Reports the current state of data distribution on a node. · The original proposal that captures the rationale and possible solution is here. [ https://issues.apache.org/jira/secure/attachment/12755226/disk-balancer-proposal.pdf ] · The updated architecture and test plan document is here. [ https://issues.apache.org/jira/secure/attachment/12810720/Architecture_and_test_update.pdf ] · The merge patch that is a diff against trunk is posted here. [ https://issues.apache.org/jira/secure/attachment/12810943/HDFS-1312.001.patch ] · The user documentation which will be part of apache is posted here. [ https://issues.apache.org/jira/secure/attachment/12805976/HDFS-9547-HDFS-1312.002.patch ] HDFS-1312 has a set of sub-tasks and they are ordered in the same sequence as they were committed to HDFS-1312. Hopefully this will make it easy to code review this branch. There are a set of commands which we would like to do later, including discovering which datanodes in the cluster would benefit by running disk balancer. Appropriate JIRAs for these future work items are filed under HDFS-1312. Disk Balancer is made possible due to the work of many community members including Arpit Agarwal, Vinayakumar B, Mingliang Liu, Tsz Wo Nicholas Sze, Lei (Eddy) Xu and Xiaobing Zhou. I would like to thank them all for the effort and support. Thanks Anu