[ https://issues.apache.org/jira/browse/KUDU-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shenxingwuying resolved KUDU-3390. ---------------------------------- Fix Version/s: 1.17.0 Resolution: Fixed > add new feature auto leader rebalancer > -------------------------------------- > > Key: KUDU-3390 > URL: https://issues.apache.org/jira/browse/KUDU-3390 > Project: Kudu > Issue Type: New Feature > Reporter: shenxingwuying > Assignee: shenxingwuying > Priority: Major > Fix For: 1.17.0 > > > The origin jira: https://issues.apache.org/jira/browse/KUDU-3061, and I > create a new Jira issus to record some infomations. > > > h1. Motivation > The number of leader replicas per tablet server can become imbalanced over > time, which lead to load skew on some nodes. > Two reasons of load skew: > * The main reason. Scan Requests has two modes: LeaderOnly(default) and > CLOSEST_REPLICA. For more accurate results, users will choose the > LeaderOnly(default) mode. Mostly, the scan load is positive correlation with > leader numbers. > * The other reason. Write requests, leaders receive write requests and > followers receive appendEntries(kudu is UpdateConsensus), the flow of > processing is a little different, which is hidden variables, maybe cause > imbalanced load. Leader rebalance will make leader and followers balanced and > eliminate hidden variables and make service more stable. > To deal with the situation, now users can use kudu CLI leader_step_down > command and write a script program to rebalance the leaders. SREs should make > the rebalance script run periodically. > > In our application situation, We have more than 1500+ kudu clusters and more > and more kudu cluster will be deployed, so it's hard that SREs maintenance > the rebalance script tasks. > kudu has the auto rebalance and has no auto leader rebalance, > We can do better. Leader kudu-master can do leader rebalance automatically. > h1. Solution > We can add an auto leader rebalance task to avoid leader replicas skew. > Running a periodic task do leader rebalance at kudu-master. > Leader rebalance only do leader transfer, do not copy replicas. The basic > idea is every tserver leaders' number : replicas' number = 1 : > (replica_refactor - 1). This is the argrithms. > If we need leader rebalance, we'd better enable replicas rebalancer. If > enable leader rebalancer but disable auto rebalancer the algorithm work well > but the effect is not good. The algorithm can be convergence, and the > algorithm's target is every tserver' replicas, number of leader : number of > follower is 1 : (replica_refactor -1). > h1. Leader Rebalance results > I do some experiments for the effective. I have a cluster, 3 machines: 3 > master instances and 3 tserver instances. > I create a table with 40 tablets(partitions) and 3 replica_factor. And load a > lots of data (40000000 records). > I disabled the leader rebalance function, and manually leader transfer all > tablets to a tserver and run writes and scans. > Then I enabled the the leader rebalance function and runs scans. The workload > as below: > The Scan command: {{./kudu_tools/kudu perf table_scan $master_list Student > -columns=id,name,brief,age,score -num_threads=4 -nofill_cache > -replica_selection="LEADER"}} > > 40: 0: 0 means node1 : node2: node3 > 47%, 18%, 19% means node1 : node2: node3 > > || ||leader ratio||scan cost||cpu usage||memory||io|| > |before leader rebalance|40: 0: 0|811.586 s|47%, 18%, 19%|no changes|102MB/s > ioutil:55%, 8KB/s ioutil:2%, 64KB/s ioutil:3%| > |after leader rebalance|13: 14: 13|611.012 s|39%, 45%, 35%|no changes|53MB/s > ioutil:31%, 80MB/s ioutil:18%, 45MB/s ioutil:24%| -- This message was sent by Atlassian Jira (v8.20.10#820010)