Re: HDFS Balancer Stuck after 10 Minz

Rakesh Radhakrishnan Thu, 08 Sep 2016 07:18:25 -0700

Have you taken multiple thread dumps (jstack) and observed the operations
which are performing during this period of time. Perhaps there could be
high chance of searching for data blocks which it can move around to
balance the cluster.


Could you tell me the used space and available space values. Have you tried
changing the threshold to a lower value, may be 10 or 5 and what happens
with this value. Also, I think there is no log messages during 15 mins time
period, any possibility of enabling debug log priority and try to dig more
about the problem.

Rakesh

On Thu, Sep 8, 2016 at 6:15 PM, Senthil Kumar <senthilec...@gmail.com>
wrote:

> Hi All ,  We are in the situation to balance the cluster data since median
> reached 98% .. I started balancer as below
>
> Hadoop Version: Hadoop 2.4.1
>
>
> /apache/hadoop/sbin/start-balancer.sh   -threshold  30
>
>
> Once i start balancer it goes will for first 8-10 minutes of time..
> Balancer was moving so quickly first 10 minutes.. Not sure whats happening
> in the cluster after sometime ( say 10 minz ) , balancer is almost stuck .
>
> Log excerpts :
>
> 2016-09-08 04:58:15,653 INFO
> org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved
> blk_-5830766563502877304_1279767737 with size=134217728 from
> 10.103.21.27:1004 to 10.142.21.56:1004 through 10.103.21.27:1004
>
> 2016-09-08 04:59:14,426 INFO
> org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved
> blk_2601479900_1104500421142 with size=268435456 from 10.103.84.51:1004 to
> 10.142.18.27:1004 through 10.103.84.16:1004
>
> 2016-09-08 05:01:15,037 INFO
> org.apache.hadoop.hdfs.server.balancer.Balancer: Successfully moved
> blk_3073791211_1104972921837 with size=268435456 from 10.103.21.27:1004 to
> 10.142.21.56:1004 through 10.103.21.42:1004
>
>
>
> [05:16]:[hadoop@lvsaishdc3sn0002:~]$ date
>
> Thu Sep  8 05:16:53 GMT+7 2016
>
> [05:16]:[hadoop@lvsaishdc3sn0002:~]$ jps
>
> 1003 Balancer
>
> 20388 Jps
>
>
>
> Last Block Mover Timestamp     : 05:01
>
> Current Timestamp                    : 05:16
>
>
> Almost 15 minz no blocks moved by balancer ..  What could be the issue here
> ??  Restart would help us start moving again..
>
>
>
> It’s not event passing iteration 1 ..
>
>
> I found one thread discussing about the same issue:
>
> http://lucene.472066.n3.nabble.com/A-question-about-
> Balancer-in-HDFS-td4118505.html
>
>
> Pls suggest here to balance cluster ..
>
>
> --Senthil
>

Re: HDFS Balancer Stuck after 10 Minz

Reply via email to