Regarding your question about a pluggable module to control placement of data, try taking a look at the abstract class BlockPlacementPolicy and BlockPlacementPolicyDefault, which is its default implementation.
On branch-1, you can find these classes at src/hdfs/org/apache/hadoop/hdfs/server/namenode. On trunk, the package structure is different, and these classes are at hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement. Best of luck with your research! --Chris On Fri, Feb 22, 2013 at 11:17 AM, Harsh J <ha...@cloudera.com> wrote: > There's no filesystem (i.e. client) level APIs to do this, but the > Balancer tool of HDFS does exactly this. Reading its sources should > let you understand what kinda calls you need to make to reuse the > balancer protocol and achieve what you need. > > In trunk, the balancer is at > > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java > > HTH, and feel free to ask any relevant follow up questions. > > On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C <karthi...@gmail.com> wrote: > > Hi, > > > > Is there any APIs to move data blocks in HDFS from one node to another * > > after* they have been added to HDFS? Also can we write some sort of > > pluggable module (like scheduler) that controls how data gets placed in > > hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't > find > > any filesystem APIs available to do that. > > > > PS: I am working on a research project where we want to investigate how > to > > optimally place data in hadoop. > > > > Thanks, > > Karthiek > > > > -- > Harsh J >