[ 
https://issues.apache.org/jira/browse/HBASE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293105#comment-15293105
 ] 

Guanghao Zhang commented on HBASE-15529:
----------------------------------------

The fail test TestStochasticLoadBalancer2#testRegionReplicasOnLargeCluster is 
related to this patch.
{code}
      if (assertFullyBalanced) {
        assertClusterAsBalanced(balancedCluster);
        List<RegionPlan> secondPlans =  loadBalancer.balanceCluster(serverMap);
        assertNull(secondPlans); // fail test
      } 
{code}

I run it on my PC. The first balance log is:
{code}
2016-05-20 16:32:21,319 INFO  [Time-limited test] 
balancer.StochasticLoadBalancer(355): start StochasticLoadBalancer.balaner, 
initCost=100051.75175175176, functionCost=RegionCountSkewCostFunction : (500.0, 
0.05); PrimaryRegionCountSkewCostFunction : (500.0, 0.05); MoveCostFunction : 
(0.0, 0.0); LocalityCostFunction : (0.0, 0.0); TableSkewCostFunction : (35.0, 
0.05005005005005005); RegionReplicaHostCostFunction : (100000.0, 1.0); 
RegionReplicaRackCostFunction : (10000.0, 0.0); ReadRequestCostFunction : (5.0, 
0.0); WriteRequestCostFunction : (5.0, 0.0); MemstoreSizeCostFunction : (5.0, 
0.0); StoreFileCostFunction : (5.0, 0.0); 
2016-05-20 16:38:21,334 DEBUG [Time-limited test] 
balancer.StochasticLoadBalancer(411): Finished computing new load balance plan. 
 Computation took 360001ms to try 1509653 different iterations.  Found a 
solution that moves 44354 regions; Going from a computed cost of 
100051.75175175176 to a new cost of 1.951951951951952
{code}

When assert cluster balanced, it will balance the cluster again and assert it 
will not run balance. But the log is:
{code}
2016-05-20 16:38:21,591 INFO  [Time-limited test] 
balancer.StochasticLoadBalancer(355): start StochasticLoadBalancer.balaner, 
initCost=0.4098264931598265, functionCost=RegionCountSkewCostFunction : (500.0, 
0.0); PrimaryRegionCountSkewCostFunction : (500.0, 4.004004004004004E-4); 
MoveCostFunction : (0.0, 0.0); LocalityCostFunction : (0.0, 0.0); 
TableSkewCostFunction : (35.0, 0.005989322655989323); 
RegionReplicaHostCostFunction : (100000.0, 0.0); RegionReplicaRackCostFunction 
: (10000.0, 0.0); ReadRequestCostFunction : (5.0, 0.0); 
WriteRequestCostFunction : (5.0, 0.0); MemstoreSizeCostFunction : (5.0, 0.0); 
StoreFileCostFunction : (5.0, 0.0); 
2016-05-20 16:44:21,565 DEBUG [Time-limited test] 
balancer.StochasticLoadBalancer(411): Finished computing new load balance plan. 
 Computation took 360001ms to try 1672827 different iterations.  Found a 
solution that moves 8 regions; Going from a computed cost of 0.4098264931598265 
to a new cost of 0.3097263930597264
{code}

The original needsBalance decide whether to balance only by region count. So it 
will not balance again and  assertNull(secondPlans) success. But the new 
needsBalance decide whether to balance by all cost function, so it balance 
again. Even I set the max running time is 360s, it can't balance totally and 
can't make every cost function's cost is 0. So I set the config of  
hbase.master.balancer.stochastic.minCostNeedBalance to 0.05 for this ut and it 
passed on my PC. Attach a v2 patch to fix this.

> Override needBalance in StochasticLoadBalancer
> ----------------------------------------------
>
>                 Key: HBASE-15529
>                 URL: https://issues.apache.org/jira/browse/HBASE-15529
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Minor
>         Attachments: 15529-v1.patch, HBASE-15529-v1.patch, HBASE-15529.patch
>
>
> StochasticLoadBalancer includes cost functions to compute the cost of region 
> rount, r/w qps, table load, region locality, memstore size, and storefile 
> size. Every cost function returns a number between 0 and 1 inclusive and the 
> computed costs are scaled by their respective multipliers. The bigger 
> multiplier means that the respective cost function have the bigger weight. 
> But needBalance decide whether to balance only by region count and doesn't 
> consider r/w qps, locality even you config these cost function with bigger 
> multiplier. StochasticLoadBalancer should override needBalance and decide 
> whether to balance by it's configs of cost functions.
> Add one new config hbase.master.balancer.stochastic.minCostNeedBalance, 
> cluster need balance when (total cost / sum multiplier) > minCostNeedBalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to