If you're looking for the online solution, Aaron's just posted a working implementation of it at https://issues.apache.org/jira/browse/HDFS-1804.
For the offline or asynchronous disk balancer discussed by https://issues.apache.org/jira/browse/HDFS-1312, if you want your tool to be part of the upstream project, I'd encourage first posting your design for vetting/comments followed by the implementation, so that all finer points get covered. The offline tool is the easiest to write, and can also exist in Python (outside of HDFS, hosted over some GitHub repo perhaps) as it doesn't really have to work with the DN or NN's protocol calls. Understanding the block data directory structure (ls -l one of your dfs.data.dirs/dfs.datanode.data.dirs and follow) should let you write one up easily. On Wed, Apr 3, 2013 at 6:36 PM, Kevin Lyda <ke...@ie.suberic.net> wrote: > I've been following https://issues.apache.org/jira/browse/HDFS-1312 > and really need the balancing tool described therein. I'd be > interested in writing it, but am not sure where to start. I'm more > comfortable in Python, but I suspect it has a better chance of being > integrated if I do it in Java. > > Is hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop the > place to look for interfaces to manipulate the filesystem? > > Kevin > > -- > Kevin Lyda > Galway, Ireland > US Citizen overseas? We can vote. > Register now: http://www.votefromabroad.org/ -- Harsh J