Status *Current state*: *Under Discussion*
*Author: Jerry Cai <https://cwiki.apache.org/confluence/display/~jerrycai> * *Release: * *Discussion thread*: *JIRA*: KAFKA-17755 <https://issues.apache.org/jira/browse/KAFKA-17755> - AbstractPartitionAssignor can not enable RackAwareAssignment base on lead rack mode Reopened Motivation The current design of Kafka's rack-aware partition assignor introduces two significant flaws: 1. *Dependency on Broker-Side Configuration*: The replica.selector.class setting on the broker must be configured to RackAwareReplicaSelector. This violates the principle that partition assignors should be customizable independently by the client. 2. *Violation of Kafka's Read-Write Consistency*: The existing approach disrupts Kafka's fundamental read-write consistency model, resulting in load imbalance and potential downstream inefficiencies. These issues necessitate an improvement to ensure better alignment between client independence and cluster balancing. Public Interfaces partition.assignment.strategy=org.apache.kafka.clients.consumer.LeaderRackAwareCooperativeStickyAssignor or partition.assignment.strategy=org.apache.kafka.clients.consumer.LeaderRackAwareRangeAssignor Proposed Changes2.1 Core Ideas The proposed changes aim to address the issues by: 1. *Reading Only from Leader Brokers*: Clients will always fetch messages from the leader replica, bypassing the need for replica.selector.class on brokers. This restores Kafka's read-write consistency model. 2. *Balancing Based on Leader Rack Information*: Balancing decisions will rely solely on the rack information of the leader replica. This simplifies the logic and ensures initial balance across racks. 3. *Optimizing Partition Assignments*: When balance is achieved, partition assignors will prioritize assigning partitions within the same rack as the leader replica whenever possible, reducing cross-rack traffic. 2.2 New Partition Assignor Algorithm The modified rack-aware partition assignor will: 1. Collect rack metadata of the leader replicas during assignment. 2. Distribute partitions across racks in a balanced manner while ensuring clients fetch from the leader replicas. 3. Apply secondary optimization to allocate partitions within the same rack as the leader when rack balance is maintained. Compatibility, Deprecation, and Migration Plan This change will not impact existing configurations where the RackAwareReplicaSelector is already in use. However, it provides an alternative mechanism that eliminates the dependency on broker-side settings, offering more flexibility for client-side customizations. Test Plan - Validate the new assignor logic across various cluster configurations and sizes. - Measure improvements in load balancing and adherence to rack-awareness principles. - Verify that read-write consistency is preserved under all conditions. Rejected Alternatives - *Continuing with Broker-Dependent Configurations*: This was deemed counterproductive as it limits client independence and disrupts load balancing. - *Full Deprecation of Rack-Aware Assignor*: Rack awareness is critical for high availability and fault tolerance; thus, its complete removal was not considered. Impact on Users Users will benefit from: - Independent client-side customization of partition assignors without broker configuration changes. - Improved load balancing and reduced cross-rack traffic. - Preservation of Kafka's core read-write consistency model.