Alexey Serbin created KUDU-3511:
-----------------------------------

             Summary: Improve 'kudu rebalance' tool to even out location loads
                 Key: KUDU-3511
                 URL: https://issues.apache.org/jira/browse/KUDU-3511
             Project: Kudu
          Issue Type: Improvement
          Components: CLI
    Affects Versions: 1.16.0, 1.15.0, 1.14.0, 1.13.0, 1.11.1, 1.12.0, 1.11.0, 
1.10.1, 1.10.0, 1.9.0, 1.17.0
            Reporter: Alexey Serbin


As of Kudu 1.17.0 and earlier, rebalancing of location-aware cluster doesn't 
even out location loads properly (the location load is defined as "the total 
number of replicas in a location divided by the number of tablet servers in the 
location").  The manifestation of the issue is prominent enough to notice if 
rebalancing, for example, a Kudu cluster with 3 locations, 3 tablet servers in 
each location, where all tables have replication factor of 5.

Such a behavior can be explained by the code in 
{{LocationBalancingAlgo::IsBalancingNeeded()}}: the criteria to find an 
imbalance yet to fix are all per-table, and the imbalance in location loads 
isn't considered at all.

Essentially, the following test scenario (if added into 
{{$KUDU_ROOT/src/kudu/rebalance/rebalance_algo-test.cc}}) fails because the 
algorithm suggests no moves to be done instead of the two moves inspected in 
the scenario below.

{noformat}
TEST(RebalanceAlgoUnitTest, RF5) {
  const TestClusterConfig kConfigs[] = {
    {
      {
        { "L0", { "00", "01", "02", }, },
        { "L1", { "10", "11", "12", }, },
        { "L2", { "20", "21", "22", }, },
      },
      { "00", "01", "02", "10", "11", "12", "20", "21", "22", },
      {
        { "A", "", { 1, 1, 0, 1, 1, 0, 1, 0, 0, } },
        { "B", "", { 0, 1, 1, 1, 0, 1, 0, 1, 0, } },
        { "C", "", { 1, 0, 1, 0, 1, 1, 0, 0, 1, } },
      },
      {
        { "B", "", "12", "20" },
        { "C", "", "02", "21" },
      }
    },
  };
  VERIFY_LOCATION_BALANCING_MOVES(kConfigs);
}
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to