[ 
https://issues.apache.org/jira/browse/KAFKA-19048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jialun Peng updated KAFKA-19048:
--------------------------------
    External issue URL:   (was: 
https://issues.apache.org/jira/browse/KAFKA-1792)

> Minimal Movement Replica Balancing algorithm
> --------------------------------------------
>
>                 Key: KAFKA-19048
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19048
>             Project: Kafka
>          Issue Type: Improvement
>          Components: generator
>            Reporter: Jialun Peng
>            Assignee: Jialun Peng
>            Priority: Major
>
> h2. Motivation
> Kafka clusters typically require rebalancing of topic replicas after 
> horizontal scaling to evenly distribute the load across new and existing 
> brokers. The current rebalancing approach does not consider the existing 
> replica distribution, often resulting in excessive and unnecessary replica 
> movements. These unnecessary movements increase rebalance duration, consume 
> significant bandwidth and CPU resources, and potentially disrupt ongoing 
> production and consumption operations. Thus, a replica rebalancing strategy 
> that minimizes movements while achieving an even distribution of replicas is 
> necessary.
> h2. Goals
> The proposed approach prioritizes the following objectives:
>  # {*}Minimal Movement{*}: Minimize the number of replica relocations during 
> rebalancing.
>  # {*}Replica Balancing{*}: Ensure that replicas are evenly distributed 
> across brokers.
>  # {*}Anti-Affinity Support{*}: Support rack-aware allocation when enabled.
>  # {*}Leader Balancing{*}: Distribute leader replicas evenly across brokers.
>  # {*}ISR Order Optimization{*}: Optimize adjacency relationships to prevent 
> failover traffic concentration in case of broker failures.
> h2. Proposed Changes
> h3. Rack-Level Replica Distribution
> The following rules ensure balanced replica allocation at the rack level:
>  # {*}When ********{{*}}{{{}*rackCount = replicationFactor*{}}}:
>  * 
>  ** Each rack receives exactly {{partitionCount}} replicas.
>  # {*}When ********{{*}}{{{}*rackCount > replicationFactor*{}}}:
>  * 
>  ** If weighted allocation {{{}(rackBrokers/totalBrokers × totalReplicas) ≥ 
> partitionCount{}}}: each rack receives exactly {{partitionCount}} replicas.
>  * 
>  ** If weighted allocation {{{}< partitionCount{}}}: distribute remaining 
> replicas using a weighted remainder allocation.
> h3. Node-Level Replica Distribution
>  # If the number of replicas assigned to a rack is not a multiple of the 
> number of nodes in that rack, some nodes will host one additional replica 
> compared to others.
>  # {*}When ********{{*}}{{{}*rackCount = replicationFactor*{}}}:
>  * 
>  ** If all racks have an equal number of nodes, each node will host an equal 
> number of replicas.
>  * 
>  ** If rack sizes vary, nodes in larger racks will host fewer replicas on 
> average.
>  # {*}When ********{{*}}{{{}*rackCount > replicationFactor*{}}}:
>  * 
>  ** If no rack has a significantly higher node weight, replicas will be 
> evenly distributed.
>  * 
>  ** If a rack has disproportionately high node weight, those nodes will 
> receive fewer replicas.
> h3. Anti-Affinity Support
> When anti-affinity is enabled, the rebalance algorithm ensures that replicas 
> of the same partition do not colocate on the same rack. Brokers without rack 
> configuration are excluded from anti-affinity checks.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to