Hi Senthi
The Balancer performance was improved dramatically recently [1]. I am not sure
if you know about the new conf and parameters; see [2]. If you are interested
in more details on how the Balancer works, please see [3]. Thanks.
1.
https://community.hortonworks.com/content/kbentry/43615/hdfs-balancer-1-100x-performance-improvement.html2.
https://community.hortonworks.com/content/kbentry/43849/hdfs-balancer-2-configurations-cli-options.html3.
https://community.hortonworks.com/content/kbentry/44148/hdfs-balancer-3-cluster-balancing-algorithm.html
Regards,Tsz-Wo
On Thursday, August 11, 2016 6:21 AM, Senthil Kumar
<[email protected]> wrote:
Hi Team , Pls add your suggestion(s) here , So that i can tune parameters
to balance cluster which is in bad shape now :( ..
--Senthil
On Thu, Aug 11, 2016 at 3:51 PM, Senthil Kumar <[email protected]>
wrote:
> Thanks Lars for your quick response!
>
> Here is my Cluster Utilization..
> DFS Used% : 74.39%
> DFS Remaining% : 25.60%
>
>
> Block Pool Used% : 74.39%
> DataNodes usages : Min % Median % Max % stdev %
> 1.25% 99.72% 99.99% 22.53%
> Hadoop Version : *2.4.1*
>
> Let's take an example :
>
> Cluster Live Nodes : 1000
> Capacity Used 95-99% : 700
> Capacity Used 90 -95 % : 50
> Capacity Used < 90 % : 250
>
> I'm looking for an option to balance the data quickly from the nodes
> category 90-95% to < 90% nodes category.. I know there is an option like
> -include & -exclude but it's not helping me ( or am i not using it
> effectively ?? Pls advise here how to use these options properly if i want
> to balance my cluster as described above ) .
>
> Is there any option like --force-balance ( include two other inputs like
> force-balance-source-hosts(file) & force-balance-dest-hosts(file) )..
> this way i believe we can achieve balancing in urgency mode when you have
> 90% of nodes hitting 99% disk usage or when we have median 95% and above
> .. Pls add your thoughts here ..
>
>
> Here is the code that constructs the NW Topology by categorizing like
> over-utilized , avg utilized and under-utilized .. Sometimes i could see
> nodes with 70% of usage also comes under over-utilized ( tried with
> threshold 10 - 30 ) . Correct me if anything wrong in my understanding.
>
> https://github.com/apache/hadoop/tree/release-2.4.1/
> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/
> hadoop/hdfs/server/balancer
>
> */*create network topology and all data node lists: *
> * * overloaded, above-average, below-average, and underloaded*
> * * we alternates the accessing of the given datanodes array either by*
> * * an increasing order or a decreasing order.*
> * */ *
> * long overLoadedBytes = 0L, underLoadedBytes = 0L;*
> * for (DatanodeInfo datanode : DFSUtil.shuffle(datanodes)) {*
> * if (datanode.isDecommissioned() ||
> datanode.isDecommissionInProgress()) {*
> * continue; // ignore decommissioning or decommissioned nodes*
> * }*
> * cluster.add(datanode);*
> * BalancerDatanode datanodeS;*
> * final double avg = policy.getAvgUtilization();*
> * if (policy.getUtilization(datanode) > avg) {*
> * datanodeS = new Source(datanode, policy, threshold);*
> * if (isAboveAvgUtilized(datanodeS)) {*
> * this.aboveAvgUtilizedDatanodes.add((Source)datanodeS);*
> * } else {*
> * assert(isOverUtilized(datanodeS)) :*
> * datanodeS.getDisplayName()+ "is not an overUtilized node";*
> * this.overUtilizedDatanodes.add((Source)datanodeS);*
> * overLoadedBytes += (long)((datanodeS.utilization-avg*
> * -threshold)*datanodeS.datanode.getCapacity()/100.0);*
> * }*
> * } else {*
> * datanodeS = new BalancerDatanode(datanode, policy, threshold);*
> * if ( isBelowOrEqualAvgUtilized(datanodeS)) {*
> * this.belowAvgUtilizedDatanodes.add(datanodeS);*
> * } else {*
> * assert isUnderUtilized(datanodeS) : "isUnderUtilized("*
> * + datanodeS.getDisplayName() + ")=" +
> isUnderUtilized(datanodeS)*
> * + ", utilization=" + datanodeS.utilization; *
> * this.underUtilizedDatanodes.add(datanodeS);*
> * underLoadedBytes += (long)((avg-threshold-*
> *
> datanodeS.utilization)*datanodeS.datanode.getCapacity()/100.0);*
> * }*
> * }*
> * datanodeMap.put(datanode.getDatanodeUuid(), datanodeS);*
> * }*
>
>
> Could someone help me here to understand the balancing policy and what are
> the different parameters should i use to balance ( bring down median )
> cluster ??
>
> --Senthil
>
> On Wed, Aug 10, 2016 at 8:21 PM, Lars Francke <[email protected]>
> wrote:
>
>> Hi Senthil,
>>
>> I'm not sure I fully understand.
>>
>> If you're using a threshold of 30 that means you have a range of 60% that
>> the balancer would consider to be okay.
>>
>> Example: The used space divided by your total available space in the
>> cluster is 80% Then with a 30% threshold the balancer would try to bring
>> all nodes within the range of 50-100% utilisation.
>>
>> The default threshold is 10% and that's a fairly huge range still
>> especially on clusters that are almost at capacity. So a threshold of 5 or
>> even lower might work for you.
>>
>> What is your utilisation in the cluster (used space / available space)?
>>
>> Cheers,
>> Lars
>>
>> On Wed, Aug 10, 2016 at 3:27 PM, Senthil Kumar <[email protected]>
>> wrote:
>>
>>> Hi Team , We are running big cluster ( 3000 nodes cluster ) , many time
>>> we
>>> are hitting Median Increasing to 99.99 % ( 80 % of the DN's ) .
>>> Balancer
>>> is running all time in cluster ..But still median is not coming down
>>> i.e <
>>> 90 % ..
>>>
>>> Here is how i start balancer ?
>>> /apache/hadoop/sbin/start-balancer.sh
>>> -Ddfs.balance.bandwidthPerSec=104857600 *-threshold 30*
>>>
>>> What the recommended value for thershold ?? Is there any way to pass
>>> param
>>> only to move blocks from Over Utilized ( 98-100%) to under utilized ?
>>>
>>>
>>> Pls advise!
>>>
>>>
>>>
>>>
>>> Regards,
>>> Senthil
>>>
>>
>>
>