try to delete /system/balancer.id and search some error or warn logs in namenode.
---- Replied Message ---- | From | Sébastien Rebecchi<srebec...@kameleoon.com.INVALID> | | Date | 3/9/2025 23:08 | | To | Zhanghaobo<hfutzhan...@163.com> | | Cc | hadoop-user-maillist<u...@hadoop.apache.org>, hdfs-dev<hdfs-dev@hadoop.apache.org> | | Subject | Re: Can not run HDFS balancer cause metrics already exists | I got the same error adding -asService in the command line (metrics already exists), the only diff is that it will retry every 5 mins 2025-03-09 15:05:04,542 INFO balancer.Balancer: Finished one round, will wait for 5.0 minutes for next round That does not seem a good workaround, my cluster have hundreds of TB to rebalance when adding a data node, and I don't remember having such issues when I was using hadoop 2.9.1. Is there any issue with balancer on recent hadoop versions? Thanks, Sébastien Le dim. 9 mars 2025 à 16:02, Sébastien Rebecchi <srebec...@kameleoon.com> a écrit : OK I can try then, hoping it will help. Btw even if it works, it does not explain this metrics exception. Any idea how to solve this, I can't find a way to delete that metrics in any hadoop doc. Thanks Sébastien. Le dim. 9 mars 2025 à 15:39, Zhanghaobo <hfutzhan...@163.com> a écrit : got it, you can use it as a service and see what will happen. ---- Replied Message ---- | From | Sébastien Rebecchi<srebec...@kameleoon.com> | | Date | 03/09/2025 22:22 | | To | Zhanghaobo<hfutzhan...@163.com> | | Cc | u...@hadoop.apache.org、hdfs-dev@hadoop.apache.org | | Subject | Re: Can not run HDFS balancer cause metrics already exists | Hi Zhanghaobo, Thanks for the message. No I don't use as service, as I said the command line is the following: hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=50 -Ddfs.datanode.balance.bandwidthPerSec=100m -Ddfs.balancer.max-size-to-move=10737418240 -threshold 1 Also no other balancer is running concurrently on any other node. Sébastien Le dim. 9 mars 2025 à 13:57, Zhanghaobo <hfutzhan...@163.com> a écrit : Hi, @Sébastien Rebecchi Don't know more details about how you start balancer, did you use -asService? ---- Replied Message ---- | From | Sébastien Rebecchi<srebec...@kameleoon.com.INVALID> | | Date | 3/9/2025 18:03 | | To | <u...@hadoop.apache.org>, <hdfs-dev@hadoop.apache.org> | | Subject | Re: Can not run HDFS balancer cause metrics already exists | Hello Could anyone help on this please? Situation is still the same after several days. I add some precisions - hadoop version 3.4.1 - balancer command line run: hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=50 -Ddfs.datanode.balance.bandwidthPerSec=100m -Ddfs.balancer.max-size-to-move=10737418240 -threshold 1 Thank you Le mar. 4 mars 2025, 16:59, Sébastien Rebecchi <srebec...@kameleoon.com> a écrit : Hello After having added a new node on my HDFS cluster, I try running balancer, but it always fails with the following error, even after retrying multiple times during the day, and even after having restarted name node What should I do to unlock? Thanks, Sébastien ERROR balancer.Balancer: Exiting balancer due an exception org.apache.hadoop.metrics2.MetricsException: Metrics source Balancer-{HERE REPLACE BY CLUSTER'S BLOCK POOL ID} already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) at org.apache.hadoop.hdfs.server.balancer.BalancerMetrics.create(BalancerMetrics.java:52) at org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:362) at org.apache.hadoop.hdfs.server.balancer.Balancer.doBalance(Balancer.java:824) at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:868) at org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:975) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1133)