[ https://issues.apache.org/jira/browse/KAFKA-19048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jialun Peng updated KAFKA-19048: -------------------------------- External issue URL: (was: https://issues.apache.org/jira/browse/KAFKA-1792) > Minimal Movement Replica Balancing algorithm > -------------------------------------------- > > Key: KAFKA-19048 > URL: https://issues.apache.org/jira/browse/KAFKA-19048 > Project: Kafka > Issue Type: Improvement > Components: generator > Reporter: Jialun Peng > Assignee: Jialun Peng > Priority: Major > > h2. Motivation > Kafka clusters typically require rebalancing of topic replicas after > horizontal scaling to evenly distribute the load across new and existing > brokers. The current rebalancing approach does not consider the existing > replica distribution, often resulting in excessive and unnecessary replica > movements. These unnecessary movements increase rebalance duration, consume > significant bandwidth and CPU resources, and potentially disrupt ongoing > production and consumption operations. Thus, a replica rebalancing strategy > that minimizes movements while achieving an even distribution of replicas is > necessary. > h2. Goals > The proposed approach prioritizes the following objectives: > # {*}Minimal Movement{*}: Minimize the number of replica relocations during > rebalancing. > # {*}Replica Balancing{*}: Ensure that replicas are evenly distributed > across brokers. > # {*}Anti-Affinity Support{*}: Support rack-aware allocation when enabled. > # {*}Leader Balancing{*}: Distribute leader replicas evenly across brokers. > # {*}ISR Order Optimization{*}: Optimize adjacency relationships to prevent > failover traffic concentration in case of broker failures. > h2. Proposed Changes > h3. Rack-Level Replica Distribution > The following rules ensure balanced replica allocation at the rack level: > # {*}When ********{{*}}{{{}*rackCount = replicationFactor*{}}}: > * > ** Each rack receives exactly {{partitionCount}} replicas. > # {*}When ********{{*}}{{{}*rackCount > replicationFactor*{}}}: > * > ** If weighted allocation {{{}(rackBrokers/totalBrokers × totalReplicas) ≥ > partitionCount{}}}: each rack receives exactly {{partitionCount}} replicas. > * > ** If weighted allocation {{{}< partitionCount{}}}: distribute remaining > replicas using a weighted remainder allocation. > h3. Node-Level Replica Distribution > # If the number of replicas assigned to a rack is not a multiple of the > number of nodes in that rack, some nodes will host one additional replica > compared to others. > # {*}When ********{{*}}{{{}*rackCount = replicationFactor*{}}}: > * > ** If all racks have an equal number of nodes, each node will host an equal > number of replicas. > * > ** If rack sizes vary, nodes in larger racks will host fewer replicas on > average. > # {*}When ********{{*}}{{{}*rackCount > replicationFactor*{}}}: > * > ** If no rack has a significantly higher node weight, replicas will be > evenly distributed. > * > ** If a rack has disproportionately high node weight, those nodes will > receive fewer replicas. > h3. Anti-Affinity Support > When anti-affinity is enabled, the rebalance algorithm ensures that replicas > of the same partition do not colocate on the same rack. Brokers without rack > configuration are excluded from anti-affinity checks. > -- This message was sent by Atlassian Jira (v8.20.10#820010)