Hi Sagar, I have some thoughts about Kafka Connect integrating with KIP-848, but I think we should have a separate discussion thread for the Kafka Connect KIP: Integrating Kafka Connect With New Consumer Rebalance Protocol [1], and let this discussion thread focus on consumer rebalance protocol, WDYT?
[1] https://cwiki.apache.org/confluence/display/KAFKA/%5BDRAFT%5DIntegrating+Kafka+Connect+With+New+Consumer+Rebalance+Protocol Thank you. Luke On Fri, Aug 12, 2022 at 9:31 PM Sagar <sagarmeansoc...@gmail.com> wrote: > Thank you Guozhang/David for the feedback. Looks like there's agreement on > using separate APIs for Connect. I would revisit the doc and see what > changes are to be made. > > Thanks! > Sagar. > > On Tue, Aug 9, 2022 at 7:11 PM David Jacot <dja...@confluent.io.invalid> > wrote: > > > Hi Sagar, > > > > Thanks for the feedback and the document. That's really helpful. I > > will take a look at it. > > > > Overall, it seems to me that both Connect and the Consumer could share > > the same underlying "engine". The main difference is that the Consumer > > assigns topic-partitions to members whereas Connect assigns tasks to > > workers. I see two ways to move forward: > > 1) We extend the new proposed APIs to support different resource types > > (e.g. partitions, tasks, etc.); or > > 2) We use new dedicated APIs for Connect. The dedicated APIs would be > > similar to the new ones but different on the content/resources and > > they would rely on the same engine on the coordinator side. > > > > I personally lean towards 2) because I am not a fan of overcharging > > APIs to serve different purposes. That being said, I am not opposed to > > 1) if we can find an elegant way to do it. > > > > I think that we can continue to discuss it here for now in order to > > ensure that this KIP is compatible with what we will do for Connect in > > the future. > > > > Best, > > David > > > > On Mon, Aug 8, 2022 at 2:41 PM David Jacot <dja...@confluent.io> wrote: > > > > > > Hi all, > > > > > > I am back from vacation. I will go through and address your comments > > > in the coming days. Thanks for your feedback. > > > > > > Cheers, > > > David > > > > > > On Wed, Aug 3, 2022 at 10:05 PM Gregory Harris <gharris1...@gmail.com> > > wrote: > > > > > > > > Hey All! > > > > > > > > Thanks for the KIP, it's wonderful to see cooperative rebalancing > > making it > > > > down the stack! > > > > > > > > I had a few questions: > > > > > > > > 1. The 'Rejected Alternatives' section describes how member epoch > > should > > > > advance in step with the group epoch and assignment epoch values. I > > think > > > > that this is a good idea for the reasons described in the KIP. When > the > > > > protocol is incrementally assigning partitions to a worker, what > member > > > > epoch does each incremental assignment use? Are member epochs > re-used, > > and > > > > a single member epoch can correspond to multiple different > > (monotonically > > > > larger) assignments? > > > > > > > > 2. Is the Assignor's 'Reason' field opaque to the group coordinator? > If > > > > not, should custom client-side assignor implementations interact with > > the > > > > Reason field, and how is its common meaning agreed upon? If so, what > > is the > > > > benefit of a distinct Reason field over including such functionality > > in the > > > > opaque metadata? > > > > > > > > 3. The following is included in the KIP: "Thanks to this, the input > of > > the > > > > client side assignor is entirely driven by the group coordinator. The > > > > consumer is no longer responsible for maintaining any state besides > its > > > > assigned partitions." Does this mean that the client-side assignor > MAY > > > > incorporate additional non-Metadata state (such as partition > > throughput, > > > > cpu/memory metrics, config topics, etc), or that additional > > non-Metadata > > > > state SHOULD NOT be used? > > > > > > > > 4. I see that there are separate classes > > > > for org.apache.kafka.server.group.consumer.PartitionAssignor > > > > and org.apache.kafka.clients.consumer.PartitionAssignor that seem to > > > > overlap significantly. Is it possible for these two implementations > to > > be > > > > unified? This would serve to promote feature parity of server-side > and > > > > client-side assignors, and would also facilitate operational > > flexibility in > > > > certain situations. For example, if a server-side assignor has some > > poor > > > > behavior and needs a patch, deploying the patched assignor to the > > client > > > > and switching one consumer group to a client-side assignor may be > > faster > > > > and less risky than patching all of the brokers. With the currently > > > > proposed distinct APIs, a non-trivial reimplementation would have to > be > > > > assembled, and if the two APIs have diverged significantly, then it > is > > > > possible that a reimplementation would not be possible. > > > > > > > > -- > > > > Greg Harris > > > > gharris1...@gmail.com > > > > github.com/gharris1727 > > > > > > > > On Wed, Aug 3, 2022 at 8:39 AM Sagar <sagarmeansoc...@gmail.com> > > wrote: > > > > > > > > > Hi Guozhang/David, > > > > > > > > > > I created a confluence page to discuss how Connect would need to > > change > > > > > based on the new rebalance protocol. Here's the page: > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/%5BDRAFT%5DIntegrating+Kafka+Connect+With+New+Consumer+Rebalance+Protocol > > > > > > > > > > It's also pretty longish and I have tried to keep a format similar > to > > > > > KIP-848. Let me know what you think. Also, do you think this should > > be > > > > > moved to a separate discussion thread or is this one fine? > > > > > > > > > > Thanks! > > > > > Sagar. > > > > > > > > > > On Tue, Jul 26, 2022 at 7:37 AM Sagar <sagarmeansoc...@gmail.com> > > wrote: > > > > > > > > > > > Hello Guozhang, > > > > > > > > > > > > Thank you so much for the doc on Kafka Streams. Sure, I would do > > the > > > > > > analysis and come up with such a document. > > > > > > > > > > > > Thanks! > > > > > > Sagar. > > > > > > > > > > > > On Tue, Jul 26, 2022 at 4:47 AM Guozhang Wang < > wangg...@gmail.com> > > > > > wrote: > > > > > > > > > > > >> Hello Sagar, > > > > > >> > > > > > >> It would be great if you could come back with some analysis on > > how to > > > > > >> implement the Connect side integration with the new protocol; so > > far > > > > > >> besides leveraging on the new "protocol type" we did not yet > think > > > > > through > > > > > >> the Connect side implementations. For Streams here's a draft of > > > > > >> integration > > > > > >> plan: > > > > > >> > > > > > >> > > > > > > > > https://docs.google.com/document/d/17PNz2sGoIvGyIzz8vLyJTJTU2rqnD_D9uHJnH9XARjU/edit#heading=h.pdgirmi57dvn > > > > > >> just FYI for your analysis on Connect. > > > > > >> > > > > > >> On Tue, Jul 19, 2022 at 10:48 PM Sagar < > sagarmeansoc...@gmail.com > > > > > > > > wrote: > > > > > >> > > > > > >> > Hi David, > > > > > >> > > > > > > >> > Thank you for your response. The reason I thought connect can > > also fit > > > > > >> into > > > > > >> > this new scheme is that even today the connect uses a > > > > > WorkerCoordinator > > > > > >> > extending from AbstractCoordinator to empower rebalances of > > > > > >> > tasks/connectors. The WorkerCoordinator sets the > protocolType() > > to > > > > > >> connect > > > > > >> > and uses the metadata() method by plumbing into > > > > > >> JoinGroupRequestProtocol. > > > > > >> > > > > > > >> > I think the changes to support connect would be similar at a > > high > > > > > level > > > > > >> to > > > > > >> > the changes in streams mainly because of the Client side > > assignors > > > > > being > > > > > >> > used in both. At an implementation level, we might need to > make > > a lot > > > > > of > > > > > >> > changes to get onto this new assignment protocol like > enhancing > > the > > > > > >> > JoinGroup request/response and SyncGroup and using > > > > > >> ConsumerGroupHeartbeat > > > > > >> > API etc again on similar lines to streams (or there might be > > > > > >> deviations). I > > > > > >> > would try to perform a detailed analysis of the same and we > > can have > > > > > a > > > > > >> > separate discussion thread for that as that would derail this > > > > > discussion > > > > > >> > thread. Let me know if that sounds good to you. > > > > > >> > > > > > > >> > Thanks! > > > > > >> > Sagar. > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > On Fri, Jul 15, 2022 at 5:47 PM David Jacot > > > > > <dja...@confluent.io.invalid > > > > > >> > > > > > > >> > wrote: > > > > > >> > > > > > > >> > > Hi Sagar, > > > > > >> > > > > > > > >> > > Thanks for your comments. > > > > > >> > > > > > > > >> > > 1) Yes. That refers to `Assignment#error`. Sure, I can > > mention it. > > > > > >> > > > > > > > >> > > 2) The idea is to transition C from his current assignment > to > > his > > > > > >> > > target assignment when he can move to epoch 3. When that > > happens, > > > > > the > > > > > >> > > member assignment is updated and persisted with all its > > assigned > > > > > >> > > partitions even if they are not all revoked yet. In other > > words, the > > > > > >> > > member assignment becomes the target assignment. This is > > basically > > > > > an > > > > > >> > > optimization to avoid having to write all the changes to the > > log. > > > > > The > > > > > >> > > examples are based on the persisted state so I understand > the > > > > > >> > > confusion. Let me see if I can improve this in the > > description. > > > > > >> > > > > > > > >> > > 3) Regarding Connect, it could reuse the protocol with a > > client side > > > > > >> > > assignor if it fits in the protocol. The assignment is about > > > > > >> > > topicid-partitions + metadata, could Connect fit into this? > > > > > >> > > > > > > > >> > > Best, > > > > > >> > > David > > > > > >> > > > > > > > >> > > On Fri, Jul 15, 2022 at 1:55 PM Sagar < > > sagarmeansoc...@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > > > > >> > > > Hi David, > > > > > >> > > > > > > > > >> > > > Thanks for the KIP. I just had minor observations: > > > > > >> > > > > > > > > >> > > > 1) In the Assignment Error section in Client Side mode > > Assignment > > > > > >> > > process, > > > > > >> > > > you mentioned => `In this case, the client side assignor > can > > > > > return > > > > > >> an > > > > > >> > > > error to the group coordinator`. In this case are you > > referring to > > > > > >> the > > > > > >> > > > Assignor returning an AssignmentError that's listed down > > towards > > > > > the > > > > > >> > end? > > > > > >> > > > If yes, do you think it would make sense to mention this > > > > > explicitly > > > > > >> > here? > > > > > >> > > > > > > > > >> > > > 2) In the Case Studies section, I have a slight confusion, > > not > > > > > sure > > > > > >> if > > > > > >> > > > others have the same. Consider this step: > > > > > >> > > > > > > > > >> > > > When B heartbeats, the group coordinator transitions him > to > > epoch > > > > > 3 > > > > > >> > > because > > > > > >> > > > B has no partitions to revoke. It persists the change and > > reply. > > > > > >> > > > > > > > > >> > > > - Group (epoch=3) > > > > > >> > > > - A > > > > > >> > > > - B > > > > > >> > > > - C > > > > > >> > > > - Target Assignment (epoch=3) > > > > > >> > > > - A - partitions=[foo-0] > > > > > >> > > > - B - partitions=[foo-2] > > > > > >> > > > - C - partitions=[foo-1] > > > > > >> > > > - Member Assignment > > > > > >> > > > - A - epoch=2, partitions=[foo-0, foo-1] > > > > > >> > > > - B - epoch=3, partitions=[foo-2] > > > > > >> > > > - C - epoch=3, partitions=[foo-1] > > > > > >> > > > > > > > > >> > > > When C heartbeats, it transitions to epoch 3 but cannot > get > > foo-1 > > > > > >> yet. > > > > > >> > > > > > > > > >> > > > Here,it's mentioned that member C can't get the foo-1 > > partition > > > > > yet, > > > > > >> > but > > > > > >> > > > based on the description above, it seems it already has > it. > > Do you > > > > > >> > think > > > > > >> > > it > > > > > >> > > > would be better to remove it and populate it only when it > > actually > > > > > >> gets > > > > > >> > > it? > > > > > >> > > > I see this in a lot of other places, so have I understood > it > > > > > >> > incorrectly > > > > > >> > > ? > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > Regarding connect , it might be out of scope of this > > discussion, > > > > > but > > > > > >> > from > > > > > >> > > > what I understood it would probably be running in client > > side > > > > > >> assignor > > > > > >> > > mode > > > > > >> > > > even on the new rebalance protocol as it has its own > Custom > > > > > >> > > Assignors(Eager > > > > > >> > > > and IncrementalCooperative). > > > > > >> > > > > > > > > >> > > > Thanks! > > > > > >> > > > > > > > > >> > > > Sagar. > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Fri, Jul 15, 2022 at 5:00 PM David Jacot > > > > > >> > <dja...@confluent.io.invalid > > > > > >> > > > > > > > > >> > > > wrote: > > > > > >> > > > > > > > > >> > > > > Thanks Hector! Our goal is to move forward with > > specialized API > > > > > >> > > > > instead of relying on one generic API. For Connect, we > > can apply > > > > > >> the > > > > > >> > > > > exact same pattern and reuse/share the core > > implementation on > > > > > the > > > > > >> > > > > server side. For the schema registry, I think that we > > should > > > > > >> consider > > > > > >> > > > > having a tailored API to do simple membership/leader > > election. > > > > > >> > > > > > > > > > >> > > > > Best, > > > > > >> > > > > David > > > > > >> > > > > > > > > > >> > > > > On Fri, Jul 15, 2022 at 10:22 AM Ismael Juma < > > ism...@juma.me.uk > > > > > > > > > > > >> > > wrote: > > > > > >> > > > > > > > > > > >> > > > > > Three quick comments: > > > > > >> > > > > > > > > > > >> > > > > > 1. Regarding java.util.regex.Pattern vs > > > > > >> com.google.re2j.Pattern, we > > > > > >> > > > > should > > > > > >> > > > > > document the differences in more detail before > deciding > > one > > > > > way > > > > > >> or > > > > > >> > > > > another. > > > > > >> > > > > > That said, if people pass java.util.regex.Pattern, > they > > expect > > > > > >> > their > > > > > >> > > > > > semantics to be honored. If we are doing something > > different, > > > > > >> then > > > > > >> > we > > > > > >> > > > > > should consider adding an overload with our own > Pattern > > class > > > > > (I > > > > > >> > > don't > > > > > >> > > > > > think we'd want to expose re2j's at this point). > > > > > >> > > > > > 2. Regarding topic ids, any major new protocol should > > > > > integrate > > > > > >> > fully > > > > > >> > > > > with > > > > > >> > > > > > it and should handle the topic recreation case > > correctly. > > > > > That's > > > > > >> > the > > > > > >> > > main > > > > > >> > > > > > part we need to handle. I agree with David that we'd > > want to > > > > > add > > > > > >> > > topic > > > > > >> > > > > ids > > > > > >> > > > > > to the relevant protocols that don't have it yet and > > that we > > > > > can > > > > > >> > > probably > > > > > >> > > > > > focus on the internals versus adding new APIs to the > > Java > > > > > >> Consumer > > > > > >> > > > > (unless > > > > > >> > > > > > we find that adding new APIs is required for > reasonable > > > > > >> semantics). > > > > > >> > > > > > 3. I am still not sure about the coordinator storing > the > > > > > >> configs. > > > > > >> > > It's > > > > > >> > > > > > powerful for configs to be centralized in the metadata > > log for > > > > > >> > > various > > > > > >> > > > > > reasons (auditability, visibility, consistency, etc.). > > > > > >> Similarly, I > > > > > >> > > am > > > > > >> > > > > not > > > > > >> > > > > > sure about automatically deleting configs in a way > that > > they > > > > > >> cannot > > > > > >> > > be > > > > > >> > > > > > recovered. A good property for modern systems is to > > minimize > > > > > the > > > > > >> > > number > > > > > >> > > > > of > > > > > >> > > > > > unrecoverable data loss scenarios. > > > > > >> > > > > > > > > > > >> > > > > > Ismael > > > > > >> > > > > > > > > > > >> > > > > > On Wed, Jul 13, 2022 at 3:47 PM David Jacot > > > > > >> > > <dja...@confluent.io.invalid > > > > > >> > > > > > > > > > > >> > > > > > wrote: > > > > > >> > > > > > > > > > > >> > > > > > > Thanks Guozhang. My answers are below: > > > > > >> > > > > > > > > > > > >> > > > > > > > 1) the migration path, especially the last step > when > > > > > clients > > > > > >> > > flip the > > > > > >> > > > > > > flag > > > > > >> > > > > > > > to enable the new protocol, in which we would > have a > > > > > window > > > > > >> > where > > > > > >> > > > > both > > > > > >> > > > > > > new > > > > > >> > > > > > > > protocols / rpcs and old protocols / rpcs are used > > by > > > > > >> members > > > > > >> > of > > > > > >> > > the > > > > > >> > > > > same > > > > > >> > > > > > > > group. How the coordinator could "mimic" the old > > behavior > > > > > >> while > > > > > >> > > > > using the > > > > > >> > > > > > > > new protocol is something we need to present > about. > > > > > >> > > > > > > > > > > > >> > > > > > > Noted. I just published a new version of KIP which > > includes > > > > > >> more > > > > > >> > > > > > > details about this. See the "Supporting Online > > Consumer > > > > > Group > > > > > >> > > Upgrade" > > > > > >> > > > > > > and the "Compatibility, Deprecation, and Migration > > Plan". I > > > > > >> think > > > > > >> > > that > > > > > >> > > > > > > I have to think through a few cases now but the > > overall idea > > > > > >> and > > > > > >> > > > > > > mechanism should be understandable. > > > > > >> > > > > > > > > > > > >> > > > > > > > 2) the usage of topic ids. So far as KIP-516 the > > topic ids > > > > > >> are > > > > > >> > > only > > > > > >> > > > > used > > > > > >> > > > > > > as > > > > > >> > > > > > > > part of RPCs and admin client, but they are not > > exposed > > > > > via > > > > > >> any > > > > > >> > > > > public > > > > > >> > > > > > > APIs > > > > > >> > > > > > > > to consumers yet. I think the question is, first > > should we > > > > > >> let > > > > > >> > > the > > > > > >> > > > > > > consumer > > > > > >> > > > > > > > client to be maintaining the names -> ids mapping > > itself > > > > > to > > > > > >> > fully > > > > > >> > > > > > > leverage > > > > > >> > > > > > > > on all the augmented existing RPCs and the new > RPCs > > with > > > > > the > > > > > >> > > topic > > > > > >> > > > > ids; > > > > > >> > > > > > > and > > > > > >> > > > > > > > secondly, should we ever consider exposing the > > topic ids > > > > > in > > > > > >> the > > > > > >> > > > > consumer > > > > > >> > > > > > > > public APIs as well (both subscribe/assign, as > well > > as in > > > > > >> the > > > > > >> > > > > rebalance > > > > > >> > > > > > > > listener for cases like topic > > deletion-and-recreation). > > > > > >> > > > > > > > > > > > >> > > > > > > a) Assuming that we would include converting all the > > offsets > > > > > >> > > related > > > > > >> > > > > > > RPCs to using topic ids in this KIP, the consumer > > would be > > > > > >> able > > > > > >> > to > > > > > >> > > > > > > fully operate with topic ids. That being said, it > > still has > > > > > to > > > > > >> > > provide > > > > > >> > > > > > > the topics names in various APIs so having a mapping > > in the > > > > > >> > > consumer > > > > > >> > > > > > > seems inevitable to me. > > > > > >> > > > > > > b) I don't have a strong opinion on this. Here I > > wonder if > > > > > >> this > > > > > >> > > goes > > > > > >> > > > > > > beyond the scope of this KIP. I would rather focus > on > > the > > > > > >> > internals > > > > > >> > > > > > > here and we can consider this separately if we see > > value in > > > > > >> doing > > > > > >> > > it. > > > > > >> > > > > > > > > > > > >> > > > > > > Coming back to Ismael's point about using topic ids > > in the > > > > > >> > > > > > > ConsumerGroupHeartbeatRequest, I think that there is > > one > > > > > >> > advantage > > > > > >> > > in > > > > > >> > > > > > > favour of it. The consumer will have the opportunity > > to > > > > > >> validate > > > > > >> > > that > > > > > >> > > > > > > the topics exists before passing them into the group > > > > > rebalance > > > > > >> > > > > > > protocol. Obviously, the coordinator will also > notice > > it but > > > > > >> it > > > > > >> > > does > > > > > >> > > > > > > not really have a way to reject an invalid topic in > > the > > > > > >> response. > > > > > >> > > > > > > > > > > > >> > > > > > > > I'm agreeing with David on all other minor > questions > > > > > except > > > > > >> for > > > > > >> > > the > > > > > >> > > > > > > > `subscribe(Pattern)` question: personally I think > > it's not > > > > > >> > > necessary > > > > > >> > > > > to > > > > > >> > > > > > > > deprecate the subscribe API with Pattern, but > > instead we > > > > > >> still > > > > > >> > > use > > > > > >> > > > > > > Pattern > > > > > >> > > > > > > > while just documenting that our subscription may > be > > > > > >> rejected by > > > > > >> > > the > > > > > >> > > > > > > server. > > > > > >> > > > > > > > Since the incompatible case is a very rare > scenario > > I felt > > > > > >> > using > > > > > >> > > an > > > > > >> > > > > > > > overloaded `String` based subscription may be more > > > > > >> vulnerable > > > > > >> > to > > > > > >> > > > > various > > > > > >> > > > > > > > invalid regexes. > > > > > >> > > > > > > > > > > > >> > > > > > > That could work. I have to look at the differences > > between > > > > > the > > > > > >> > two > > > > > >> > > > > > > engines to better understand the potential issues. > My > > > > > >> > > understanding is > > > > > >> > > > > > > that would work for all the basic regular > > expressions. The > > > > > >> > > differences > > > > > >> > > > > > > between the two are mainly about the various > character > > > > > >> classes. I > > > > > >> > > > > > > wonder what other people think about this. > > > > > >> > > > > > > > > > > > >> > > > > > > Best, > > > > > >> > > > > > > David > > > > > >> > > > > > > > > > > > >> > > > > > > On Tue, Jul 12, 2022 at 11:28 PM Guozhang Wang < > > > > > >> > wangg...@gmail.com > > > > > >> > > > > > > > > >> > > > > wrote: > > > > > >> > > > > > > > > > > > > >> > > > > > > > Thanks David! I think on the high level there are > > two meta > > > > > >> > > points we > > > > > >> > > > > need > > > > > >> > > > > > > > to concretize a bit more: > > > > > >> > > > > > > > > > > > > >> > > > > > > > 1) the migration path, especially the last step > when > > > > > clients > > > > > >> > > flip the > > > > > >> > > > > > > flag > > > > > >> > > > > > > > to enable the new protocol, in which we would > have a > > > > > window > > > > > >> > where > > > > > >> > > > > both > > > > > >> > > > > > > new > > > > > >> > > > > > > > protocols / rpcs and old protocols / rpcs are used > > by > > > > > >> members > > > > > >> > of > > > > > >> > > the > > > > > >> > > > > same > > > > > >> > > > > > > > group. How the coordinator could "mimic" the old > > behavior > > > > > >> while > > > > > >> > > > > using the > > > > > >> > > > > > > > new protocol is something we need to present > about. > > > > > >> > > > > > > > 2) the usage of topic ids. So far as KIP-516 the > > topic ids > > > > > >> are > > > > > >> > > only > > > > > >> > > > > used > > > > > >> > > > > > > as > > > > > >> > > > > > > > part of RPCs and admin client, but they are not > > exposed > > > > > via > > > > > >> any > > > > > >> > > > > public > > > > > >> > > > > > > APIs > > > > > >> > > > > > > > to consumers yet. I think the question is, first > > should we > > > > > >> let > > > > > >> > > the > > > > > >> > > > > > > consumer > > > > > >> > > > > > > > client to be maintaining the names -> ids mapping > > itself > > > > > to > > > > > >> > fully > > > > > >> > > > > > > leverage > > > > > >> > > > > > > > on all the augmented existing RPCs and the new > RPCs > > with > > > > > the > > > > > >> > > topic > > > > > >> > > > > ids; > > > > > >> > > > > > > and > > > > > >> > > > > > > > secondly, should we ever consider exposing the > > topic ids > > > > > in > > > > > >> the > > > > > >> > > > > consumer > > > > > >> > > > > > > > public APIs as well (both subscribe/assign, as > well > > as in > > > > > >> the > > > > > >> > > > > rebalance > > > > > >> > > > > > > > listener for cases like topic > > deletion-and-recreation). > > > > > >> > > > > > > > > > > > > >> > > > > > > > I'm agreeing with David on all other minor > questions > > > > > except > > > > > >> for > > > > > >> > > the > > > > > >> > > > > > > > `subscribe(Pattern)` question: personally I think > > it's not > > > > > >> > > necessary > > > > > >> > > > > to > > > > > >> > > > > > > > deprecate the subscribe API with Pattern, but > > instead we > > > > > >> still > > > > > >> > > use > > > > > >> > > > > > > Pattern > > > > > >> > > > > > > > while just documenting that our subscription may > be > > > > > >> rejected by > > > > > >> > > the > > > > > >> > > > > > > server. > > > > > >> > > > > > > > Since the incompatible case is a very rare > scenario > > I felt > > > > > >> > using > > > > > >> > > an > > > > > >> > > > > > > > overloaded `String` based subscription may be more > > > > > >> vulnerable > > > > > >> > to > > > > > >> > > > > various > > > > > >> > > > > > > > invalid regexes. > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > Guozhang > > > > > >> > > > > > > > > > > > > >> > > > > > > > On Tue, Jul 12, 2022 at 5:23 AM David Jacot > > > > > >> > > > > <dja...@confluent.io.invalid > > > > > >> > > > > > > > > > > > > >> > > > > > > > wrote: > > > > > >> > > > > > > > > > > > > >> > > > > > > > > Hi Ismael, > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Thanks for your feedback. Let me answer your > > questions > > > > > >> > inline. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 1. I think it's premature to talk about target > > > > > versions > > > > > >> for > > > > > >> > > > > > > deprecation > > > > > >> > > > > > > > > and > > > > > >> > > > > > > > > > removal of the existing group protocol. Unlike > > KRaft, > > > > > >> this > > > > > >> > > > > affects a > > > > > >> > > > > > > core > > > > > >> > > > > > > > > > client protocol and hence deprecation/removal > > will be > > > > > >> > heavily > > > > > >> > > > > > > dependent > > > > > >> > > > > > > > > on > > > > > >> > > > > > > > > > how quickly applications migrate to the new > > protocol. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > That makes sense. I will remove it. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 2. The KIP says we intend to release this in > > 4.x, but > > > > > it > > > > > >> > > wasn't > > > > > >> > > > > made > > > > > >> > > > > > > > > clear > > > > > >> > > > > > > > > > why. If we added that as a way to estimate > when > > we'd > > > > > >> > > deprecate > > > > > >> > > > > and > > > > > >> > > > > > > remove > > > > > >> > > > > > > > > > the group protocol, I also suggest removing > > this part. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Let me explain my reasoning. As explained, I > plan > > to > > > > > >> rewrite > > > > > >> > > the > > > > > >> > > > > group > > > > > >> > > > > > > > > coordinator in Java while we implement the new > > protocol. > > > > > >> This > > > > > >> > > means > > > > > >> > > > > > > > > that the internals will be slightly different > > (e.g. > > > > > >> threading > > > > > >> > > > > model). > > > > > >> > > > > > > > > Therefore, I wanted to tighten the switch from > > the old > > > > > >> group > > > > > >> > > > > > > > > coordinator to the new group coordinator to a > > major > > > > > >> release. > > > > > >> > > The > > > > > >> > > > > > > > > alternative would be to use a flag to do the > > switch > > > > > >> instead > > > > > >> > of > > > > > >> > > > > relying > > > > > >> > > > > > > > > on the software upgrade. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 3. We need to flesh out the details of the > > migration > > > > > >> story. > > > > > >> > > It > > > > > >> > > > > sounds > > > > > >> > > > > > > > > like > > > > > >> > > > > > > > > > we're saying we will support online > migrations. > > Is > > > > > that > > > > > >> > > correct? > > > > > >> > > > > We > > > > > >> > > > > > > > > should > > > > > >> > > > > > > > > > explain this in detail. It could also be done > > as a > > > > > >> separate > > > > > >> > > KIP, > > > > > >> > > > > if > > > > > >> > > > > > > it's > > > > > >> > > > > > > > > > easier. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Yes, we will support online migrations for the > > group. > > > > > That > > > > > >> > > means > > > > > >> > > > > that > > > > > >> > > > > > > > > a group using the old protocol will be able to > > switch to > > > > > >> the > > > > > >> > > new > > > > > >> > > > > > > > > protocol. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Let me briefly explain how that will work > though. > > It is > > > > > >> > > basically a > > > > > >> > > > > > > > > four step process: > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > 1. The cluster must be upgraded or rolled to a > > software > > > > > >> > > supporting > > > > > >> > > > > the > > > > > >> > > > > > > > > new group coordinator. Both the old and the new > > > > > >> coordinator > > > > > >> > > will > > > > > >> > > > > > > > > support the old protocol and rely on the same > > persisted > > > > > >> > > metadata so > > > > > >> > > > > > > > > they can work together. This point is an offline > > > > > >> migration. > > > > > >> > We > > > > > >> > > > > cannot > > > > > >> > > > > > > > > do this one live because it would require > > shutting down > > > > > >> the > > > > > >> > > current > > > > > >> > > > > > > > > coordinator and starting up the new one and that > > would > > > > > >> cause > > > > > >> > > > > > > > > unavailabilities. > > > > > >> > > > > > > > > 2. The cluster's metadata version/IBP must be > > upgraded > > > > > to > > > > > >> X > > > > > >> > in > > > > > >> > > > > order > > > > > >> > > > > > > > > to enable the new protocol. This cannot be done > > before > > > > > 1) > > > > > >> is > > > > > >> > > > > > > > > terminated because the old coordinator doesn't > > support > > > > > the > > > > > >> > new > > > > > >> > > > > > > > > protocol. > > > > > >> > > > > > > > > 3. The consumers must be upgraded to a version > > > > > supporting > > > > > >> the > > > > > >> > > > > online > > > > > >> > > > > > > > > migration (must have KIP-792). If the consumer > is > > > > > already > > > > > >> > > there. > > > > > >> > > > > > > > > Nothing must be done at this point. > > > > > >> > > > > > > > > 4. The consumers must be rolled with the feature > > flag > > > > > >> turned > > > > > >> > > on. > > > > > >> > > > > The > > > > > >> > > > > > > > > consumer group is automatically converted when > > the first > > > > > >> > > consumer > > > > > >> > > > > > > > > using the new protocol joins the group. While > the > > > > > members > > > > > >> > > using the > > > > > >> > > > > > > > > old protocol are being upgraded, the old > protocol > > is > > > > > >> proxied > > > > > >> > > into > > > > > >> > > > > the > > > > > >> > > > > > > > > new one. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Let me clarify all of this in the KIP. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 4. I am happy that we are pushing the pattern > > > > > >> subscriptions > > > > > >> > > to > > > > > >> > > > > the > > > > > >> > > > > > > > > server, > > > > > >> > > > > > > > > > but it seems like there could be some tricky > > > > > >> compatibility > > > > > >> > > > > issues. > > > > > >> > > > > > > Will > > > > > >> > > > > > > > > we > > > > > >> > > > > > > > > > have a mechanism for users to detect that they > > need to > > > > > >> > update > > > > > >> > > > > their > > > > > >> > > > > > > regex > > > > > >> > > > > > > > > > before switching to the new protocol? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > I think that I am a bit more optimistic than you > > on this > > > > > >> > > point. I > > > > > >> > > > > > > > > believe that the majority of the cases are > simple > > > > > regexes > > > > > >> > which > > > > > >> > > > > should > > > > > >> > > > > > > > > work with the new engine. The coordinator will > > verify > > > > > the > > > > > >> > regex > > > > > >> > > > > anyway > > > > > >> > > > > > > > > and reject the consumer if the regex is not > valid. > > > > > Coming > > > > > >> > back > > > > > >> > > to > > > > > >> > > > > the > > > > > >> > > > > > > > > migration path, in the worst case, the first > > upgraded > > > > > >> > consumer > > > > > >> > > > > joining > > > > > >> > > > > > > > > the group will be rejected. This should be used > > as the > > > > > >> last > > > > > >> > > > > defence, I > > > > > >> > > > > > > > > would say. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > One way for customers to validate their regex > > before > > > > > >> > upgrading > > > > > >> > > > > their > > > > > >> > > > > > > > > prod would be to test them with another group. > For > > > > > >> instance, > > > > > >> > > that > > > > > >> > > > > > > > > could be done in a pre-prod environment. Another > > way > > > > > >> would be > > > > > >> > > to > > > > > >> > > > > > > > > extend the consumer-group tool to provide a > regex > > > > > >> validation > > > > > >> > > > > > > > > mechanism. Would this be enough in your opinion? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 5. Related to the last question, will the Java > > client > > > > > >> allow > > > > > >> > > the > > > > > >> > > > > > > users to > > > > > >> > > > > > > > > > stick with the current regex engine for > > compatibility > > > > > >> > > reasons? > > > > > >> > > > > For > > > > > >> > > > > > > > > example, > > > > > >> > > > > > > > > > it may be handy to keep using client based > > regex at > > > > > >> first > > > > > >> > to > > > > > >> > > keep > > > > > >> > > > > > > > > > migrations simple and then migrate to server > > based > > > > > >> regexes > > > > > >> > > as a > > > > > >> > > > > > > second > > > > > >> > > > > > > > > step. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > I understand your point but I am concerned that > > this > > > > > would > > > > > >> > > allow > > > > > >> > > > > users > > > > > >> > > > > > > > > to actually stay in this mode. That would go > > against our > > > > > >> goal > > > > > >> > > of > > > > > >> > > > > > > > > simplifying the client because we would have to > > continue > > > > > >> > > monitoring > > > > > >> > > > > > > > > the metadata on the client side. I would rather > > not do > > > > > >> this. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 6. When we say that the group coordinator will > > be > > > > > >> > > responsible for > > > > > >> > > > > > > storing > > > > > >> > > > > > > > > > the configurations and that the configurations > > will be > > > > > >> > > deleted > > > > > >> > > > > when > > > > > >> > > > > > > the > > > > > >> > > > > > > > > > group is deleted. Will a transition to DEAD > > trigger > > > > > >> > deletion > > > > > >> > > of > > > > > >> > > > > > > > > > configurations? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > That's right. The configurations will be deleted > > when > > > > > the > > > > > >> > > group is > > > > > >> > > > > > > > > deleted. They go together. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 7. Will the choice to store the configs in the > > group > > > > > >> > > coordinator > > > > > >> > > > > > > make it > > > > > >> > > > > > > > > > harder to list all cluster configs and their > > values? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > I don't think so. The group configurations are > > overrides > > > > > >> of > > > > > >> > > cluster > > > > > >> > > > > > > > > configs. If you want to know all the overrides > > though, > > > > > you > > > > > >> > > would > > > > > >> > > > > have > > > > > >> > > > > > > > > to ask all the group coordinators. You cannot > > rely on > > > > > the > > > > > >> > > metadata > > > > > >> > > > > log > > > > > >> > > > > > > > > for instance. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 8. How would someone configure a group before > > starting > > > > > >> the > > > > > >> > > > > consumers? > > > > > >> > > > > > > > > Have > > > > > >> > > > > > > > > > we considered allowing the explicit creation > of > > > > > groups? > > > > > >> > > > > > > Alternatively, > > > > > >> > > > > > > > > the > > > > > >> > > > > > > > > > configs could be decoupled from the group > > lifecycle. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Yes. The group will be automatically created in > > this > > > > > case. > > > > > >> > > However, > > > > > >> > > > > > > > > the configs will be lost after the retention > > period of > > > > > the > > > > > >> > > group > > > > > >> > > > > > > > > passes. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 9. Will the Consumer.subscribe method for the > > Java > > > > > >> client > > > > > >> > > still > > > > > >> > > > > take > > > > > >> > > > > > > a > > > > > >> > > > > > > > > > `java.util.regex.Pattern` of do we have to > > introduce > > > > > an > > > > > >> > > overload? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > That's a very group question. I forgot about > that > > one. > > > > > As > > > > > >> the > > > > > >> > > > > > > > > `java.util.regex.Pattern` is not fully > compatible > > with > > > > > the > > > > > >> > > engine > > > > > >> > > > > that > > > > > >> > > > > > > > > we plan to use, it might be better to deprecate > > it and > > > > > >> use an > > > > > >> > > > > overload > > > > > >> > > > > > > > > which takes a string. We would rely on the > server > > side > > > > > >> > > validation. > > > > > >> > > > > > > > > During the migration, I think that we could > still > > try to > > > > > >> > > toString > > > > > >> > > > > the > > > > > >> > > > > > > > > regex and use it. That should work, I think, in > > the > > > > > >> majority > > > > > >> > > of the > > > > > >> > > > > > > > > cases. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 10. I agree with Justine that we should be > > clearer > > > > > about > > > > > >> > the > > > > > >> > > > > reason > > > > > >> > > > > > > to > > > > > >> > > > > > > > > > switch to IBP/metadata.version from the > feature > > flag. > > > > > >> Maybe > > > > > >> > > we > > > > > >> > > > > mean > > > > > >> > > > > > > that > > > > > >> > > > > > > > > we > > > > > >> > > > > > > > > > can switch the default for the feature flag to > > true > > > > > >> based > > > > > >> > on > > > > > >> > > the > > > > > >> > > > > > > > > > metadata.version once we want to make it the > > default. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > My plan was to use that feature flag mainly > > during the > > > > > >> > > development > > > > > >> > > > > > > > > phase. I should not have mentioned it, I think, > > because > > > > > we > > > > > >> > > could > > > > > >> > > > > use > > > > > >> > > > > > > > > an internal config for it. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 11. Some of the protocol APIs don't mention > the > > > > > required > > > > > >> > > ACLs, it > > > > > >> > > > > > > would > > > > > >> > > > > > > > > be > > > > > >> > > > > > > > > > good to add that for consistency. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Noted. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 12. It is a bit odd that > ConsumerGroupHeartbeat > > > > > requires > > > > > >> > > "Read > > > > > >> > > > > Group" > > > > > >> > > > > > > > > even > > > > > >> > > > > > > > > > though it seems to do more than reading. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > I agree. This is how the current protocol works > > though. > > > > > We > > > > > >> > only > > > > > >> > > > > > > > > require "Read Group" to join a group. We could > > consider > > > > > >> > > changing > > > > > >> > > > > this > > > > > >> > > > > > > > > but I am not sure that it is worth it. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 13. How is topic recreation handled by the > > consumer > > > > > with > > > > > >> > the > > > > > >> > > new > > > > > >> > > > > > > group > > > > > >> > > > > > > > > > protocol? It would be good to have a section > on > > this. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Noted. From a protocol perspective, the new > topic > > will > > > > > >> have a > > > > > >> > > new > > > > > >> > > > > > > > > topic id so it will treat it like a topic with a > > > > > different > > > > > >> > > name. > > > > > >> > > > > The > > > > > >> > > > > > > > > only issue is that the fetch/commit offsets APIs > > do not > > > > > >> > support > > > > > >> > > > > topic > > > > > >> > > > > > > > > IDs so the consumer would reuse the offsets > based > > on the > > > > > >> > same. > > > > > >> > > I > > > > > >> > > > > think > > > > > >> > > > > > > > > that we should update those APIs as well in > order > > to be > > > > > >> > > consistent > > > > > >> > > > > end > > > > > >> > > > > > > > > to end. That would strengthen the semantics of > the > > > > > >> consumer. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 14. The KIP mentions we will write the new > > coordinator > > > > > >> in > > > > > >> > > Java. > > > > > >> > > > > Even > > > > > >> > > > > > > > > though > > > > > >> > > > > > > > > > this is an implementation detail, do we plan > to > > have a > > > > > >> new > > > > > >> > > gradle > > > > > >> > > > > > > module > > > > > >> > > > > > > > > > for it? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Yes. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 15. Do we have a scalability goal when it > comes > > to how > > > > > >> many > > > > > >> > > > > members > > > > > >> > > > > > > the > > > > > >> > > > > > > > > new > > > > > >> > > > > > > > > > group protocol can support? > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > We don't have numbers at the moment. The > protocol > > should > > > > > >> > > support > > > > > >> > > > > 1000s > > > > > >> > > > > > > > > of members per group. We will measure this when > > we have > > > > > a > > > > > >> > first > > > > > >> > > > > > > > > implementation. Note that we might have other > > > > > bottlenecks > > > > > >> > down > > > > > >> > > the > > > > > >> > > > > > > > > road (e.g. offset commits). > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > 16. Did we consider having SubscribedTopidIds > > instead > > > > > of > > > > > >> > > > > > > > > > SubscribedTopicNames in > > ConsumerGroupHeartbeatRequest? > > > > > >> Is > > > > > >> > the > > > > > >> > > > > idea > > > > > >> > > > > > > that > > > > > >> > > > > > > > > > since we have to resolve the regex on the > > server, we > > > > > >> can do > > > > > >> > > the > > > > > >> > > > > same > > > > > >> > > > > > > for > > > > > >> > > > > > > > > > the topic name? The difference is that sending > > the > > > > > >> regex is > > > > > >> > > more > > > > > >> > > > > > > > > efficient > > > > > >> > > > > > > > > > whereas sending the topic names is less > > efficient. > > > > > >> > > Furthermore, > > > > > >> > > > > > > delete > > > > > >> > > > > > > > > and > > > > > >> > > > > > > > > > recreation is easier to handle if we have > topic > > ids. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > The idea was to consolidate the metadata lookup > > on the > > > > > >> server > > > > > >> > > for > > > > > >> > > > > both > > > > > >> > > > > > > > > paths but I do agree with your point. As a > second > > > > > though, > > > > > >> > using > > > > > >> > > > > topic > > > > > >> > > > > > > > > ids may be better here for the delete and > > recreation > > > > > case. > > > > > >> > > Also, I > > > > > >> > > > > > > > > suppose that we may allow users to subscribe > with > > topic > > > > > >> ids > > > > > >> > in > > > > > >> > > the > > > > > >> > > > > > > > > future because that is the only way to be really > > robust > > > > > to > > > > > >> > > topic > > > > > >> > > > > > > > > re-creation. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > Best, > > > > > >> > > > > > > > > David > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > On Tue, Jul 12, 2022 at 1:38 PM David Jacot < > > > > > >> > > dja...@confluent.io> > > > > > >> > > > > > > wrote: > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > Hi Justine, > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > Thanks for your comments. Please find my > answers > > > > > below. > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > - Yes, the new protocol relies on topic IDs > > with the > > > > > >> > > exception > > > > > >> > > > > of the > > > > > >> > > > > > > > > > topic names based in the > > > > > ConsumerGroupHeartbeatRequest. > > > > > >> I > > > > > >> > am > > > > > >> > > not > > > > > >> > > > > sure > > > > > >> > > > > > > > > > if using topic names is the right call here. I > > need to > > > > > >> > think > > > > > >> > > > > about it > > > > > >> > > > > > > > > > a little more. Obviously, the KIP does not > > change the > > > > > >> > > > > fetch/commit > > > > > >> > > > > > > > > > offsets RPCs to use topic IDs. This may be > > something > > > > > >> that > > > > > >> > we > > > > > >> > > > > should > > > > > >> > > > > > > > > > include though as it would give better overall > > > > > >> guarantee in > > > > > >> > > the > > > > > >> > > > > > > > > > producer. > > > > > >> > > > > > > > > > - You're right. I think that I should not have > > > > > mentioned > > > > > >> > this > > > > > >> > > > > flag at > > > > > >> > > > > > > > > > all. I will remove it. We can use an internal > > > > > >> configuration > > > > > >> > > while > > > > > >> > > > > > > > > > developing the feature. > > > > > >> > > > > > > > > > - Both cluster types will be supported. The > > change is > > > > > >> > > > > orthogonal. The > > > > > >> > > > > > > > > > only requirement is that the cluster uses > topic > > IDs. > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > Best, > > > > > >> > > > > > > > > > David > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > On Mon, Jul 11, 2022 at 9:53 PM Guozhang Wang > < > > > > > >> > > > > wangg...@gmail.com> > > > > > >> > > > > > > > > wrote: > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Hi Ismael, > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Thanks for the feedback. Here are some > replies > > > > > inlined > > > > > >> > > below: > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > On Sat, Jul 9, 2022 at 2:53 AM Ismael Juma < > > > > > >> > > ism...@juma.me.uk> > > > > > >> > > > > > > wrote: > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > Thanks for the KIP. This has the potential > > to be a > > > > > >> > great > > > > > >> > > > > > > > > improvement. A few > > > > > >> > > > > > > > > > > > initial questions/comments: > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > 1. I think it's premature to talk about > > target > > > > > >> versions > > > > > >> > > for > > > > > >> > > > > > > > > deprecation and > > > > > >> > > > > > > > > > > > removal of the existing group protocol. > > Unlike > > > > > >> KRaft, > > > > > >> > > this > > > > > >> > > > > > > affects a > > > > > >> > > > > > > > > core > > > > > >> > > > > > > > > > > > client protocol and hence > > deprecation/removal will > > > > > >> be > > > > > >> > > heavily > > > > > >> > > > > > > > > dependent on > > > > > >> > > > > > > > > > > > how quickly applications migrate to the > new > > > > > >> protocol. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Yeah I agree with you. I think we can remove > > the > > > > > >> proposed > > > > > >> > > > > timeline > > > > > >> > > > > > > in > > > > > >> > > > > > > > > the > > > > > >> > > > > > > > > > > `Compatibility, Deprecation, and Migration > > Plan` and > > > > > >> > > instead > > > > > >> > > > > just > > > > > >> > > > > > > state > > > > > >> > > > > > > > > > > that we will decide in the future about when > > we > > > > > would > > > > > >> > > > > deprecate old > > > > > >> > > > > > > > > > > protocol and behaviors. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 2. The KIP says we intend to release this > > in 4.x, > > > > > >> but > > > > > >> > it > > > > > >> > > > > wasn't > > > > > >> > > > > > > made > > > > > >> > > > > > > > > clear > > > > > >> > > > > > > > > > > > why. If we added that as a way to estimate > > when > > > > > we'd > > > > > >> > > > > deprecate > > > > > >> > > > > > > and > > > > > >> > > > > > > > > remove > > > > > >> > > > > > > > > > > > the group protocol, I also suggest > removing > > this > > > > > >> part. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > I think that's not specifically related to > the > > > > > >> > > > > deprecation/removal > > > > > >> > > > > > > > > timeline > > > > > >> > > > > > > > > > > plan, but it's more for client upgrades. > I.e. > > the > > > > > >> > > broker-side > > > > > >> > > > > > > > > > > implementation may be done first, and then > the > > > > > client > > > > > >> > side, > > > > > >> > > > > and we > > > > > >> > > > > > > > > would > > > > > >> > > > > > > > > > > only mark it as "released" by the time > clients > > > > > >> > > implementations > > > > > >> > > > > are > > > > > >> > > > > > > > > done. At > > > > > >> > > > > > > > > > > that time, to enable the feature the clients > > need to > > > > > >> > first > > > > > >> > > > > swap-in > > > > > >> > > > > > > the > > > > > >> > > > > > > > > > > bytecode with a rolling bounce and then set > > the flag > > > > > >> > with a > > > > > >> > > > > second > > > > > >> > > > > > > > > rolling > > > > > >> > > > > > > > > > > bounce, and hence we feel it's better to be > > released > > > > > >> in a > > > > > >> > > major > > > > > >> > > > > > > > > version. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 3. We need to flesh out the details of the > > > > > migration > > > > > >> > > story. > > > > > >> > > > > It > > > > > >> > > > > > > > > sounds like > > > > > >> > > > > > > > > > > > we're saying we will support online > > migrations. Is > > > > > >> that > > > > > >> > > > > correct? > > > > > >> > > > > > > We > > > > > >> > > > > > > > > should > > > > > >> > > > > > > > > > > > explain this in detail. It could also be > > done as a > > > > > >> > > separate > > > > > >> > > > > KIP, > > > > > >> > > > > > > if > > > > > >> > > > > > > > > it's > > > > > >> > > > > > > > > > > > easier. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Yes I think that's the part we can be more > > concrete > > > > > >> about > > > > > >> > > for > > > > > >> > > > > sure > > > > > >> > > > > > > (and > > > > > >> > > > > > > > > > > this is related to your question 2) above). > > We will > > > > > >> work > > > > > >> > on > > > > > >> > > > > making > > > > > >> > > > > > > it > > > > > >> > > > > > > > > more > > > > > >> > > > > > > > > > > explicit in parallel as we solicit more > > feedback. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 4. I am happy that we are pushing the > > pattern > > > > > >> > > subscriptions > > > > > >> > > > > to > > > > > >> > > > > > > the > > > > > >> > > > > > > > > server, > > > > > >> > > > > > > > > > > > but it seems like there could be some > tricky > > > > > >> > > compatibility > > > > > >> > > > > > > issues. > > > > > >> > > > > > > > > Will we > > > > > >> > > > > > > > > > > > have a mechanism for users to detect that > > they > > > > > need > > > > > >> to > > > > > >> > > update > > > > > >> > > > > > > their > > > > > >> > > > > > > > > regex > > > > > >> > > > > > > > > > > > before switching to the new protocol? > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Yes I think we need some tooling for > non-java > > client > > > > > >> > users > > > > > >> > > to > > > > > >> > > > > sort > > > > > >> > > > > > > of > > > > > >> > > > > > > > > > > "dry-run" the client before switching to the > > new > > > > > >> > protocol. > > > > > >> > > I > > > > > >> > > > > do not > > > > > >> > > > > > > > > have a > > > > > >> > > > > > > > > > > specific idea on top of my head though, > maybe > > others > > > > > >> like > > > > > >> > > @Matt > > > > > >> > > > > > > > > Howlett can > > > > > >> > > > > > > > > > > chime-in here? > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 5. Related to the last question, will the > > Java > > > > > >> client > > > > > >> > > allow > > > > > >> > > > > the > > > > > >> > > > > > > > > users to > > > > > >> > > > > > > > > > > > stick with the current regex engine for > > > > > >> compatibility > > > > > >> > > > > reasons? > > > > > >> > > > > > > For > > > > > >> > > > > > > > > example, > > > > > >> > > > > > > > > > > > it may be handy to keep using client based > > regex > > > > > at > > > > > >> > > first to > > > > > >> > > > > keep > > > > > >> > > > > > > > > > > > migrations simple and then migrate to > > server based > > > > > >> > > regexes > > > > > >> > > > > as a > > > > > >> > > > > > > > > second > > > > > >> > > > > > > > > > > > step. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Honestly I have not thought about that for > > java > > > > > >> clients, > > > > > >> > > and > > > > > >> > > > > we can > > > > > >> > > > > > > > > discuss > > > > > >> > > > > > > > > > > that. What kind of compatibility issues do > > you have > > > > > in > > > > > >> > > mind? > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 6. When we say that the group coordinator > > will be > > > > > >> > > > > responsible for > > > > > >> > > > > > > > > storing > > > > > >> > > > > > > > > > > > the configurations and that the > > configurations > > > > > will > > > > > >> be > > > > > >> > > > > deleted > > > > > >> > > > > > > when > > > > > >> > > > > > > > > the > > > > > >> > > > > > > > > > > > group is deleted. Will a transition to > DEAD > > > > > trigger > > > > > >> > > deletion > > > > > >> > > > > of > > > > > >> > > > > > > > > > > > configurations? > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Yes, since the DEAD state is an ending state > > (we > > > > > would > > > > > >> > only > > > > > >> > > > > > > transit to > > > > > >> > > > > > > > > that > > > > > >> > > > > > > > > > > state when the group is EMPTY and also all > of > > its > > > > > >> > metadata > > > > > >> > > are > > > > > >> > > > > > > gone), > > > > > >> > > > > > > > > once > > > > > >> > > > > > > > > > > it's transited to DEAD this group would > never > > be > > > > > >> revived. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 7. Will the choice to store the configs in > > the > > > > > group > > > > > >> > > > > coordinator > > > > > >> > > > > > > > > make it > > > > > >> > > > > > > > > > > > harder to list all cluster configs and > their > > > > > values? > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > That's a good question, and our thoughts are > > that > > > > > the > > > > > >> > > so-called > > > > > >> > > > > > > "group > > > > > >> > > > > > > > > > > configurations" are overrides of the > > cluster-level > > > > > >> > > > > configurations > > > > > >> > > > > > > > > > > customized per group so when an admin list > > cluster > > > > > >> > configs > > > > > >> > > it's > > > > > >> > > > > > > okay to > > > > > >> > > > > > > > > > > list just the cluster-level "defaults", not > > showing > > > > > >> any > > > > > >> > > > > per-group > > > > > >> > > > > > > > > > > customizations. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 8. How would someone configure a group > > before > > > > > >> starting > > > > > >> > > the > > > > > >> > > > > > > > > consumers? Have > > > > > >> > > > > > > > > > > > we considered allowing the explicit > > creation of > > > > > >> groups? > > > > > >> > > > > > > > > Alternatively, the > > > > > >> > > > > > > > > > > > configs could be decoupled from the group > > > > > lifecycle. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > The configs can be created before the group > > itself > > > > > as > > > > > >> an > > > > > >> > > > > > > independent > > > > > >> > > > > > > > > entity > > > > > >> > > > > > > > > > > --- of course, this requires the > corresponding > > > > > >> request to > > > > > >> > > be > > > > > >> > > > > > > routed to > > > > > >> > > > > > > > > the > > > > > >> > > > > > > > > > > right coordinator based on the group id --- > > the only > > > > > >> > thing > > > > > >> > > that > > > > > >> > > > > > > > > differs is, > > > > > >> > > > > > > > > > > when the group itself is gone we also check > > if there > > > > > >> are > > > > > >> > > any > > > > > >> > > > > > > > > configuration > > > > > >> > > > > > > > > > > entities related to that group and delete as > > well. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Admittedly this indeed introduces an > > asymmetry on > > > > > the > > > > > >> > > creation > > > > > >> > > > > / > > > > > >> > > > > > > > > deletion > > > > > >> > > > > > > > > > > lifecycles of the config entities, and we > > would like > > > > > >> to > > > > > >> > > hear > > > > > >> > > > > > > everyone's > > > > > >> > > > > > > > > > > feelings whether we should aim for symmetry > > i.e. > > > > > >> totally > > > > > >> > > > > decouple > > > > > >> > > > > > > group > > > > > >> > > > > > > > > > > configs and hence not delete them at all > when > > the > > > > > >> group > > > > > >> > is > > > > > >> > > > > gone, > > > > > >> > > > > > > but > > > > > >> > > > > > > > > always > > > > > >> > > > > > > > > > > require explicit deletion operations by > > themselves. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 9. Will the Consumer.subscribe method for > > the Java > > > > > >> > client > > > > > >> > > > > still > > > > > >> > > > > > > take > > > > > >> > > > > > > > > a > > > > > >> > > > > > > > > > > > `java.util.regex.Pattern` of do we have to > > > > > >> introduce an > > > > > >> > > > > overload? > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > I think we do not need to introduce an > > overload, but > > > > > >> I'm > > > > > >> > > all > > > > > >> > > > > ears > > > > > >> > > > > > > if > > > > > >> > > > > > > > > there > > > > > >> > > > > > > > > > > may be some compatibility issues that we may > > > > > overlook. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 10. I agree with Justine that we should be > > clearer > > > > > >> > about > > > > > >> > > the > > > > > >> > > > > > > reason > > > > > >> > > > > > > > > to > > > > > >> > > > > > > > > > > > switch to IBP/metadata.version from the > > feature > > > > > >> flag. > > > > > >> > > Maybe > > > > > >> > > > > we > > > > > >> > > > > > > mean > > > > > >> > > > > > > > > that we > > > > > >> > > > > > > > > > > > can switch the default for the feature > flag > > to > > > > > true > > > > > >> > > based on > > > > > >> > > > > the > > > > > >> > > > > > > > > > > > metadata.version once we want to make it > the > > > > > >> default. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > 11. Some of the protocol APIs don't mention > > the > > > > > >> required > > > > > >> > > ACLs, > > > > > >> > > > > it > > > > > >> > > > > > > > > would be > > > > > >> > > > > > > > > > > > good to add that for consistency. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Ack, we can certainly do that. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 12. It is a bit odd that > > ConsumerGroupHeartbeat > > > > > >> > requires > > > > > >> > > > > "Read > > > > > >> > > > > > > > > Group" even > > > > > >> > > > > > > > > > > > though it seems to do more than reading. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > I had that thought myself as well, but in > the > > end we > > > > > >> > could > > > > > >> > > not > > > > > >> > > > > > > find a > > > > > >> > > > > > > > > > > better alternative: adding Write Group seems > > an > > > > > >> overkill > > > > > >> > > here > > > > > >> > > > > > > since we > > > > > >> > > > > > > > > do > > > > > >> > > > > > > > > > > not have it elsewhere (we only have Read / > > Delete > > > > > and > > > > > >> > > Describe > > > > > >> > > > > on > > > > > >> > > > > > > > > groups so > > > > > >> > > > > > > > > > > far). Would like to hear others thoughts. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 13. How is topic recreation handled by the > > > > > consumer > > > > > >> > with > > > > > >> > > the > > > > > >> > > > > new > > > > > >> > > > > > > > > group > > > > > >> > > > > > > > > > > > protocol? It would be good to have a > > section on > > > > > >> this. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > You mean with regex subscription right? Yes > > we can > > > > > >> add a > > > > > >> > > > > section > > > > > >> > > > > > > about > > > > > >> > > > > > > > > > > that, but basically the idea is that > consumer > > would > > > > > be > > > > > >> > > totally > > > > > >> > > > > > > > > agnostic in > > > > > >> > > > > > > > > > > the new protocol as it's handled all by the > > brokers. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 14. The KIP mentions we will write the new > > > > > >> coordinator > > > > > >> > in > > > > > >> > > > > Java. > > > > > >> > > > > > > Even > > > > > >> > > > > > > > > though > > > > > >> > > > > > > > > > > > this is an implementation detail, do we > > plan to > > > > > >> have a > > > > > >> > > new > > > > > >> > > > > gradle > > > > > >> > > > > > > > > module > > > > > >> > > > > > > > > > > > for it? > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > We have not thought about that. But I think > > the > > > > > answer > > > > > >> > > should > > > > > >> > > > > be > > > > > >> > > > > > > yes. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 15. Do we have a scalability goal when it > > comes to > > > > > >> how > > > > > >> > > many > > > > > >> > > > > > > members > > > > > >> > > > > > > > > the new > > > > > >> > > > > > > > > > > > group protocol can support? > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > Within a group, I think we should shoot for > > 1000s of > > > > > >> > > members. > > > > > >> > > > > But > > > > > >> > > > > > > that > > > > > >> > > > > > > > > > > scalability goals also depend on the offset > > > > > management > > > > > >> > > (commit, > > > > > >> > > > > > > fetch) > > > > > >> > > > > > > > > > > capabilities of the coordinator which we did > > not > > > > > >> cover in > > > > > >> > > this > > > > > >> > > > > > > KIP, so > > > > > >> > > > > > > > > it's > > > > > >> > > > > > > > > > > hard to give a number that applies > > universally. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > 16. Did we consider having > > SubscribedTopidIds > > > > > >> instead > > > > > >> > of > > > > > >> > > > > > > > > > > > SubscribedTopicNames in > > > > > >> ConsumerGroupHeartbeatRequest? > > > > > >> > > Is the > > > > > >> > > > > > > idea > > > > > >> > > > > > > > > that > > > > > >> > > > > > > > > > > > since we have to resolve the regex on the > > server, > > > > > we > > > > > >> > can > > > > > >> > > do > > > > > >> > > > > the > > > > > >> > > > > > > same > > > > > >> > > > > > > > > for > > > > > >> > > > > > > > > > > > the topic name? The difference is that > > sending the > > > > > >> > regex > > > > > >> > > is > > > > > >> > > > > more > > > > > >> > > > > > > > > efficient > > > > > >> > > > > > > > > > > > whereas sending the topic names is less > > efficient. > > > > > >> > > > > Furthermore, > > > > > >> > > > > > > > > delete and > > > > > >> > > > > > > > > > > > recreation is easier to handle if we have > > topic > > > > > ids. > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > The main reason to still let the clients > send > > names > > > > > >> is to > > > > > >> > > keep > > > > > >> > > > > the > > > > > >> > > > > > > > > > > reasoning of names -> ids on the broker / > > admin > > > > > client > > > > > >> > > only. > > > > > >> > > > > Note > > > > > >> > > > > > > that > > > > > >> > > > > > > > > > > although we added topic id in KIP-516, we > > never > > > > > >> > > implemented the > > > > > >> > > > > > > logic > > > > > >> > > > > > > > > on > > > > > >> > > > > > > > > > > consumer/producers leveraging the related > > newer > > > > > >> versioned > > > > > >> > > RPCs, > > > > > >> > > > > > > > > instead we > > > > > >> > > > > > > > > > > just set the topic id as empty UUID. We want > > to keep > > > > > >> the > > > > > >> > > > > > > > > consumer/producer > > > > > >> > > > > > > > > > > to be thin and only delegate the reasoning > on > > broker > > > > > >> and > > > > > >> > > > > > > potentially > > > > > >> > > > > > > > > admin > > > > > >> > > > > > > > > > > clients. > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > Thanks, > > > > > >> > > > > > > > > > > > Ismael > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > On Wed, Jul 6, 2022 at 10:45 AM David > Jacot > > > > > >> > > > > > > > > <dja...@confluent.io.invalid> > > > > > >> > > > > > > > > > > > wrote: > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > Hi all, > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > I would like to start a discussion > thread > > on > > > > > >> KIP-848: > > > > > >> > > The > > > > > >> > > > > Next > > > > > >> > > > > > > > > > > > > Generation of the Consumer Rebalance > > Protocol. > > > > > >> With > > > > > >> > > this > > > > > >> > > > > KIP, > > > > > >> > > > > > > we > > > > > >> > > > > > > > > aim > > > > > >> > > > > > > > > > > > > to make the rebalance protocol (for > > consumers) > > > > > >> more > > > > > >> > > > > reliable, > > > > > >> > > > > > > more > > > > > >> > > > > > > > > > > > > scalable, easier to implement for > > clients, and > > > > > >> easier > > > > > >> > > to > > > > > >> > > > > debug > > > > > >> > > > > > > for > > > > > >> > > > > > > > > > > > > operators. > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > The KIP is here: > > > > > >> > > > > https://cwiki.apache.org/confluence/x/HhD1D. > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > Please take a look and let me know what > > you > > > > > think. > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > Best, > > > > > >> > > > > > > > > > > > > David > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > PS: I will be away from July 18th to > > August 8th. > > > > > >> That > > > > > >> > > gives > > > > > >> > > > > > > you a > > > > > >> > > > > > > > > bit > > > > > >> > > > > > > > > > > > > of time to read and digest this long > KIP. > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > -- > > > > > >> > > > > > > > > > > -- Guozhang > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > -- > > > > > >> > > > > > > > -- Guozhang > > > > > >> > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> -- Guozhang > > > > > >> > > > > > > > > > > > > > >