Hi Everyone, Recently I came across Kafka setup where two data centers are close to each other, but the company could not find a suitable place for the third one. As a result third DC is little further, lower network throughput, but still within range of decent network latency, qualifying for stretch cluster. Let us assume that client applications are being deployed only on two "primary" DCs. My idea was to minimize network traffic between DC3 and other data centers (ideally only to replication).
For Kafka consumer, we can configure rack-awareness, so that consumers will read data from closest replica (replica.selector.class). Kafka producers have to send data to partition leaders. There is no way to tell that we prefer replica leaders to be running in DC1 and DC2. Kafka will also try to evenly balance leaders across brokers (auto.leader.rebalance.enable). Does it sound like a good feature to make the choice of partition leaders pluggable? Basically, users would be given list of topic-partitions with ISRs and rack they are running, and could reshuffle them according to custom logic. Comments appreciated. Kind regards, Lukasz