[jira] [Resolved] (KUDU-3390) add new feature auto leader rebalancer

shenxingwuying (Jira) Mon, 27 Mar 2023 21:01:13 -0700


     [ 
https://issues.apache.org/jira/browse/KUDU-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


shenxingwuying resolved KUDU-3390.
----------------------------------
    Fix Version/s: 1.17.0
       Resolution: Fixed

> add new feature auto leader rebalancer
> --------------------------------------
>
>                 Key: KUDU-3390
>                 URL: https://issues.apache.org/jira/browse/KUDU-3390
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>             Fix For: 1.17.0
>
>
> The origin jira: https://issues.apache.org/jira/browse/KUDU-3061, and I 
> create a new Jira issus to record some infomations.
>  
>  
> h1. Motivation
> The number of leader replicas per tablet server can become imbalanced over 
> time, which lead to load skew on some nodes.
> Two reasons of load skew:
>  * The main reason. Scan Requests has two modes: LeaderOnly(default) and 
> CLOSEST_REPLICA. For more accurate results, users will choose the 
> LeaderOnly(default) mode. Mostly, the scan load is positive correlation with 
> leader numbers.
>  * The other reason. Write requests, leaders receive write requests and 
> followers receive appendEntries(kudu is UpdateConsensus), the flow of 
> processing is a little different, which is hidden variables, maybe cause 
> imbalanced load. Leader rebalance will make leader and followers balanced and 
> eliminate hidden variables and make service more stable.
> To deal with the situation, now users can use kudu CLI leader_step_down 
> command and write a script program to rebalance the leaders. SREs should make 
> the rebalance script run periodically.
>  
> In our application situation, We have more than 1500+ kudu clusters and more 
> and more kudu cluster will be deployed, so it's hard that SREs maintenance 
> the rebalance script tasks.
> kudu has the auto rebalance and has no auto leader rebalance,
> We can do better. Leader kudu-master can do leader rebalance automatically.
> h1. Solution
> We can add an auto leader rebalance task to avoid leader replicas skew. 
> Running a periodic task do leader rebalance at kudu-master.
> Leader rebalance only do leader transfer, do not copy replicas. The basic 
> idea is every tserver leaders' number : replicas' number = 1 : 
> (replica_refactor - 1). This is the argrithms.
> If we need leader rebalance, we'd better enable replicas rebalancer. If 
> enable leader rebalancer but disable auto rebalancer the algorithm work well 
> but the effect is not good. The algorithm can be convergence, and the 
> algorithm's target is every tserver' replicas, number of leader : number of 
> follower is 1 : (replica_refactor -1).
> h1. Leader Rebalance results
> I do some experiments for the effective. I have a cluster, 3 machines: 3 
> master instances and 3 tserver instances.
> I create a table with 40 tablets(partitions) and 3 replica_factor. And load a 
> lots of data (40000000 records).
> I disabled the leader rebalance function, and manually leader transfer all 
> tablets to a tserver and run writes and scans.
> Then I enabled the the leader rebalance function and runs scans. The workload 
> as below:
> The Scan command: {{./kudu_tools/kudu perf table_scan $master_list Student 
> -columns=id,name,brief,age,score -num_threads=4 -nofill_cache 
> -replica_selection="LEADER"}}
>  
> 40: 0: 0  means node1 : node2: node3
> 47%, 18%, 19% means node1 : node2: node3
>  
> || ||leader ratio||scan cost||cpu usage||memory||io||
> |before leader rebalance|40: 0: 0|811.586 s|47%, 18%, 19%|no changes|102MB/s 
> ioutil:55%, 8KB/s ioutil:2%, 64KB/s ioutil:3%|
> |after leader rebalance|13: 14: 13|611.012 s|39%, 45%, 35%|no changes|53MB/s 
> ioutil:31%, 80MB/s ioutil:18%, 45MB/s ioutil:24%|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KUDU-3390) add new feature auto leader rebalancer

Reply via email to