A. Sophie Blee-Goldman created KAFKA-12477:
----------------------------------------------

             Summary: Smart rebalancing with dynamic protocol selection
                 Key: KAFKA-12477
                 URL: https://issues.apache.org/jira/browse/KAFKA-12477
             Project: Kafka
          Issue Type: Improvement
          Components: consumer
            Reporter: A. Sophie Blee-Goldman
             Fix For: 3.0.0


Users who want to upgrade their applications and enable the COOPERATIVE 
rebalancing protocol in their consumer apps are required to follow a double 
rolling bounce upgrade path. The reason for this is laid out in the [Consumer 
Upgrades|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP429:KafkaConsumerIncrementalRebalanceProtocol-Consumer]
 section of KIP-429. Basically, the ConsumerCoordinator picks a rebalancing 
protocol in its constructor based on the list of supported partition assignors. 
The protocol is selected as the highest protocol that is commonly supported by 
all assignors in the list, and never changes after that.

This is a bit unfortunate because it may end up using an older protocol even 
after every member in the group has been updated to support the newer protocol. 
After the first rolling bounce of the upgrade, all members will have two 
assignors: "cooperative-sticky" and "range" (or sticky/round-robin/etc). At 
this point the EAGER protocol will still be selected due to the presence of the 
"range" assignor, but it's the "cooperative-sticky" assignor that will 
ultimately be selected for use in rebalances if that assignor is preferred (ie 
positioned first in the list). The only reason for the second rolling bounce is 
to strip off the "range" assignor and allow the upgraded members to switch over 
to COOPERATIVE. We can't allow them to use cooperative rebalancing until 
everyone has been upgraded, but once they have it's safe to do so.

And there is already a way for the client to detect that everyone is on the new 
byte code: if the CooperativeStickyAssignor is selected by the group 
coordinator, then that means it is supported by all consumers in the group and 
therefore everyone must be upgraded. 

We may be able to save the second rolling bounce by dynamically updating the 
rebalancing protocol inside the ConsumerCoordinator as "the highest protocol 
supported by the assignor chosen by the group coordinator". This means we'll 
still be using EAGER at the first rebalance, since we of course need to wait 
for this initial rebalance to get the response from the group coordinator. But 
we should take the hint from the chosen assignor rather than dropping this 
information on the floor and sticking with the original protocol



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to