Have you taken multiple thread dumps (jstack) and observed the operations which are performing during this period of time. Perhaps there could be high chance of searching for data blocks which it can move around to balance the cluster.
Could you tell me the used space and available space values. Have you tried changing the threshold to a lower value, may be 10 or 5 and what happens with this value. Also, I think there is no log messages during 15 mins time period, any possibility of enabling debug log priority and try to dig more about the problem. Rakesh On Thu, Sep 8, 2016 at 6:15 PM, Senthil Kumar <senthilec...@gmail.com> wrote: > Hi All , We are in the situation to balance the cluster data since median > reached 98% .. I started balancer as below > > Hadoop Version: Hadoop 2.4.1 > > > /apache/hadoop/sbin/start-balancer.sh -threshold 30 > > > Once i start balancer it goes will for first 8-10 minutes of time.. > Balancer was moving so quickly first 10 minutes.. Not sure whats happening > in the cluster after sometime ( say 10 minz ) , balancer is almost stuck . > > Log excerpts : > > 2016-09-08 04:58:15,653 INFO > org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved > blk_-5830766563502877304_1279767737 with size=134217728 from > 10.103.21.27:1004 to 10.142.21.56:1004 through 10.103.21.27:1004 > > 2016-09-08 04:59:14,426 INFO > org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved > blk_2601479900_1104500421142 with size=268435456 from 10.103.84.51:1004 to > 10.142.18.27:1004 through 10.103.84.16:1004 > > 2016-09-08 05:01:15,037 INFO > org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved > blk_3073791211_1104972921837 with size=268435456 from 10.103.21.27:1004 to > 10.142.21.56:1004 through 10.103.21.42:1004 > > > > [05:16]:[hadoop@lvsaishdc3sn0002:~]$ date > > Thu Sep 8 05:16:53 GMT+7 2016 > > [05:16]:[hadoop@lvsaishdc3sn0002:~]$ jps > > 1003 Balancer > > 20388 Jps > > > > Last Block Mover Timestamp : 05:01 > > Current Timestamp : 05:16 > > > Almost 15 minz no blocks moved by balancer .. What could be the issue here > ?? Restart would help us start moving again.. > > > > It’s not event passing iteration 1 .. > > > I found one thread discussing about the same issue: > > http://lucene.472066.n3.nabble.com/A-question-about- > Balancer-in-HDFS-td4118505.html > > > Pls suggest here to balance cluster .. > > > --Senthil >