Alexey Serbin created KUDU-3511:
-----------------------------------
Summary: Improve 'kudu rebalance' tool to even out location loads
Key: KUDU-3511
URL: https://issues.apache.org/jira/browse/KUDU-3511
Project: Kudu
Issue Type: Improvement
Components: CLI
Affects Versions: 1.16.0, 1.15.0, 1.14.0, 1.13.0, 1.11.1, 1.12.0, 1.11.0,
1.10.1, 1.10.0, 1.9.0, 1.17.0
Reporter: Alexey Serbin
As of Kudu 1.17.0 and earlier, rebalancing of location-aware cluster doesn't
even out location loads properly (the location load is defined as "the total
number of replicas in a location divided by the number of tablet servers in the
location"). The manifestation of the issue is prominent enough to notice if
rebalancing, for example, a Kudu cluster with 3 locations, 3 tablet servers in
each location, where all tables have replication factor of 5.
Such a behavior can be explained by the code in
{{LocationBalancingAlgo::IsBalancingNeeded()}}: the criteria to find an
imbalance yet to fix are all per-table, and the imbalance in location loads
isn't considered at all.
Essentially, the following test scenario (if added into
{{$KUDU_ROOT/src/kudu/rebalance/rebalance_algo-test.cc}}) fails because the
algorithm suggests no moves to be done instead of the two moves inspected in
the scenario below.
{noformat}
TEST(RebalanceAlgoUnitTest, RF5) {
const TestClusterConfig kConfigs[] = {
{
{
{ "L0", { "00", "01", "02", }, },
{ "L1", { "10", "11", "12", }, },
{ "L2", { "20", "21", "22", }, },
},
{ "00", "01", "02", "10", "11", "12", "20", "21", "22", },
{
{ "A", "", { 1, 1, 0, 1, 1, 0, 1, 0, 0, } },
{ "B", "", { 0, 1, 1, 1, 0, 1, 0, 1, 0, } },
{ "C", "", { 1, 0, 1, 0, 1, 1, 0, 0, 1, } },
},
{
{ "B", "", "12", "20" },
{ "C", "", "02", "21" },
}
},
};
VERIFY_LOCATION_BALANCING_MOVES(kConfigs);
}
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)