Hello David, Alright I think that's sufficient. Just to make that clear in the doc, could we update:
1) the heartbeat request handling section, stating when coordinator will trigger rebalance based on the HB's member metadata / reason? 2) the "Rebalance Triggers" section to include what we described in "Group Epoch - Trigger a rebalance" section as well? Guozhang On Fri, Sep 9, 2022 at 1:28 AM David Jacot <dja...@confluent.io.invalid> wrote: > Hi Guozhang, > > I thought that the assignor will always be consulted when the next > heartbeat request is constructed. In other words, > `PartitionAssignor#metadata` will be called for every heartbeat. This > gives the opportunity for the assignor to enforce a rebalance by > setting the reason to a non-zero value or by changing the bytes. Do > you think that this is not sufficient? Are you concerned by the delay? > > Best, > David > > On Fri, Sep 9, 2022 at 7:10 AM Guozhang Wang <wangg...@gmail.com> wrote: > > > > Hello David, > > > > One of Jun's comments make me thinking: > > > > ``` > > In this case, a new assignment is triggered by the client side > > assignor. When constructing the HB, the consumer will always consult > > the client side assignor and propagate the information to the group > > coordinator. In other words, we don't expect users to call > > Consumer#enforceRebalance anymore. > > ``` > > > > As I looked at the current PartitionAssignor's interface, we actually do > > not have a way yet to instruct how to construct the next HB request, e.g. > > when the assignor wants to enforce a new rebalance with a new assignment, > > we'd need some customizable APIs inside the PartitionAssignor to indicate > > the next HB telling broker about so. WDYT about adding such an API on the > > PartitionAssignor? > > > > > > Guozhang > > > > > > On Tue, Sep 6, 2022 at 6:09 AM David Jacot <dja...@confluent.io.invalid> > > wrote: > > > > > Hi Jun, > > > > > > I have updated the KIP to include your feedback. I have also tried to > > > clarify the parts which were not cleared. > > > > > > Best, > > > David > > > > > > On Fri, Sep 2, 2022 at 4:18 PM David Jacot <dja...@confluent.io> > wrote: > > > > > > > > Hi Jun, > > > > > > > > Thanks for your feedback. Let me start by answering your questions > > > > inline and I will update the KIP next week. > > > > > > > > > Thanks for the KIP. Overall, the main benefits of the KIP seem to > be > > > fewer > > > > > RPCs during rebalance and more efficient support of wildcard. A few > > > > > comments below. > > > > > > > > I would also add that the KIP removes the global sync barrier in the > > > > protocol which is essential to improve group stability and > > > > scalability, and the KIP also simplifies the client by moving most of > > > > the logic to the server side. > > > > > > > > > 30. ConsumerGroupHeartbeatRequest > > > > > 30.1 ServerAssignor is a singleton. Do we plan to support rolling > > > changing > > > > > of the partition assignor in the consumers? > > > > > > > > Definitely. The group coordinator will use the assignor used by a > > > > majority of the members. This allows the group to move from one > > > > assignor to another by a roll. This is explained in the Assignor > > > > Selection chapter. > > > > > > > > > 30.2 For each field, could you explain whether it's required in > every > > > > > request or the scenarios when it needs to be filled? For example, > it's > > > not > > > > > clear to me when TopicPartitions needs to be filled. > > > > > > > > The client is expected to set those fields in case of a connection > > > > issue (e.g. timeout) or when the fields have changed since the last > > > > HB. The server populates those fields as long as the member is not > > > > fully reconciled - the member should acknowledge that it has the > > > > expected epoch and assignment. I will clarify this in the KIP. > > > > > > > > > 31. In the current consumer protocol, the rack affinity between the > > > client > > > > > and the broker is only considered during fetching, but not during > > > assigning > > > > > partitions to consumers. Sometimes, once the assignment is made, > there > > > is > > > > > no opportunity for read affinity because no replicas of assigned > > > partitions > > > > > are close to the member. I am wondering if we should use this > > > opportunity > > > > > to address this by including rack in GroupMember. > > > > > > > > That's an interesting idea. I don't see any issue with adding the > rack > > > > to the members. I will do so. > > > > > > > > > 32. On the metric side, often, it's useful to know how busy a group > > > > > coordinator is. By moving the event loop model, it seems that we > could > > > add > > > > > a metric that tracks the fraction of the time the event loop is > doing > > > the > > > > > actual work. > > > > > > > > That's a great idea. I will add it. Thanks. > > > > > > > > > 33. Could we add a section on coordinator failover handling? For > > > example, > > > > > does it need to trigger the check if any group with the wildcard > > > > > subscription now has a new matching topic? > > > > > > > > Sure. When the new group coordinator takes over, it has to: > > > > * Setup the session timeouts. > > > > * Trigger a new assignment if a client side assignor is used. We > don't > > > > store the information about the member selected to run the assignment > > > > so we have to start a new one. > > > > * Update the topics metadata, verify the wildcard subscriptions, and > > > > trigger a rebalance if needed. > > > > > > > > > 34. ConsumerGroupMetadataValue, > ConsumerGroupPartitionMetadataValue, > > > > > ConsumerGroupMemberMetadataValue: Could we document what the epoch > > > field > > > > > reflects? For example, does the epoch in ConsumerGroupMetadataValue > > > reflect > > > > > the latest group epoch? What about the one in > > > > > ConsumerGroupPartitionMetadataValue and > > > ConsumerGroupMemberMetadataValue? > > > > > > > > Sure. I will clarify that but it is always the latest group epoch. > > > > When the group state is updated, the group epoch is bumped so we use > > > > that one for all the change records related to the update. > > > > > > > > > 35. "the group coordinator will ensure that the following > invariants > > > are > > > > > met: ... All members exists." It's possible for a member not to > get any > > > > > assigned partitions, right? > > > > > > > > That's right. Here I meant that the members provided by the assignor > > > > in the assignment must exist in the group. The assignor can not make > > > > up new member ids. > > > > > > > > > 36. "He can rejoins the group with a member epoch equals to 0": > When > > > would > > > > > a consumer rejoin and what member id would be used? > > > > > > > > A member is expected to abandon all its partitions and rejoins when > it > > > > receives the FENCED_MEMBER_EPOCH error. In this case, the group > > > > coordinator will have removed the member from the group. The member > > > > can rejoin the group with the same member id but with 0 as epoch. Let > > > > me see if I can clarify this in the KIP. > > > > > > > > > 37. "Instead, power users will have the ability to trigger a > > > reassignment > > > > > by either providing a non-zero reason or by updating the assignor > > > > > metadata." Hmm, this seems to be conflicting with the deprecation > of > > > > > Consumer#enforeRebalance. > > > > > > > > In this case, a new assignment is triggered by the client side > > > > assignor. When constructing the HB, the consumer will always consult > > > > the client side assignor and propagate the information to the group > > > > coordinator. In other words, we don't expect users to call > > > > Consumer#enforceRebalance anymore. > > > > > > > > > 38. The reassignment examples are nice. But the section seems to > have > > > > > multiple typos. > > > > > 38.1 When the group transitions to epoch 2, B immediately gets into > > > > > "epoch=1, partitions=[foo-2]", which seems incorrect. > > > > > 38.2 When the group transitions to epoch 3, C seems to get into > > > epoch=3, > > > > > partitions=[foo-1] too early. > > > > > 38.3 After A transitions to epoch 3, C still has A - epoch=2, > > > > > partitions=[foo-0]. > > > > > > > > Sorry for that! I will revise them. > > > > > > > > > 39. Rolling upgrade of consumers: Do we support the upgrade from > any > > > old > > > > > version to new one? > > > > > > > > We will support upgrading from the consumer protocol version 3, > > > > introduced in KIP-792. KIP-792 is not implemented yet so the earliest > > > > version is unknown at the moment. This is explained in the migration > > > > plan chapter. > > > > > > > > Thanks again for your feedback, Jun. I will update the KIP based on > it > > > > next week. > > > > > > > > Best, > > > > David > > > > > > > > On Thu, Sep 1, 2022 at 9:07 PM Jun Rao <j...@confluent.io.invalid> > wrote: > > > > > > > > > > Hi, David, > > > > > > > > > > Thanks for the KIP. Overall, the main benefits of the KIP seem to > be > > > fewer > > > > > RPCs during rebalance and more efficient support of wildcard. A few > > > > > comments below. > > > > > > > > > > 30. ConsumerGroupHeartbeatRequest > > > > > 30.1 ServerAssignor is a singleton. Do we plan to support rolling > > > changing > > > > > of the partition assignor in the consumers? > > > > > 30.2 For each field, could you explain whether it's required in > every > > > > > request or the scenarios when it needs to be filled? For example, > it's > > > not > > > > > clear to me when TopicPartitions needs to be filled. > > > > > > > > > > 31. In the current consumer protocol, the rack affinity between the > > > client > > > > > and the broker is only considered during fetching, but not during > > > assigning > > > > > partitions to consumers. Sometimes, once the assignment is made, > there > > > is > > > > > no opportunity for read affinity because no replicas of assigned > > > partitions > > > > > are close to the member. I am wondering if we should use this > > > opportunity > > > > > to address this by including rack in GroupMember. > > > > > > > > > > 32. On the metric side, often, it's useful to know how busy a group > > > > > coordinator is. By moving the event loop model, it seems that we > could > > > add > > > > > a metric that tracks the fraction of the time the event loop is > doing > > > the > > > > > actual work. > > > > > > > > > > 33. Could we add a section on coordinator failover handling? For > > > example, > > > > > does it need to trigger the check if any group with the wildcard > > > > > subscription now has a new matching topic? > > > > > > > > > > 34. ConsumerGroupMetadataValue, > ConsumerGroupPartitionMetadataValue, > > > > > ConsumerGroupMemberMetadataValue: Could we document what the epoch > > > field > > > > > reflects? For example, does the epoch in ConsumerGroupMetadataValue > > > reflect > > > > > the latest group epoch? What about the one in > > > > > ConsumerGroupPartitionMetadataValue and > > > ConsumerGroupMemberMetadataValue? > > > > > > > > > > 35. "the group coordinator will ensure that the following > invariants > > > are > > > > > met: ... All members exists." It's possible for a member not to > get any > > > > > assigned partitions, right? > > > > > > > > > > 36. "He can rejoins the group with a member epoch equals to 0": > When > > > would > > > > > a consumer rejoin and what member id would be used? > > > > > > > > > > 37. "Instead, power users will have the ability to trigger a > > > reassignment > > > > > by either providing a non-zero reason or by updating the assignor > > > > > metadata." Hmm, this seems to be conflicting with the deprecation > of > > > > > Consumer#enforeRebalance. > > > > > > > > > > 38. The reassignment examples are nice. But the section seems to > have > > > > > multiple typos. > > > > > 38.1 When the group transitions to epoch 2, B immediately gets into > > > > > "epoch=1, partitions=[foo-2]", which seems incorrect. > > > > > 38.2 When the group transitions to epoch 3, C seems to get into > > > epoch=3, > > > > > partitions=[foo-1] too early. > > > > > 38.3 After A transitions to epoch 3, C still has A - epoch=2, > > > > > partitions=[foo-0]. > > > > > > > > > > 39. Rolling upgrade of consumers: Do we support the upgrade from > any > > > old > > > > > version to new one? > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > On Mon, Aug 29, 2022 at 9:20 AM David Jacot > > > <dja...@confluent.io.invalid> > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > The KIP states that we will re-implement the coordinator in > Java. I > > > > > > discussed this offline with a few folks and folks are concerned > that > > > > > > we could introduce many regressions in the old protocol if we do > so. > > > > > > Therefore, I am going to remove this statement from the KIP. It > is an > > > > > > implementation detail after all so it does not have to be > decided at > > > > > > this stage. We will likely start by trying to refactor the > current > > > > > > implementation as a first step. > > > > > > > > > > > > Cheers, > > > > > > David > > > > > > > > > > > > On Mon, Aug 29, 2022 at 3:52 PM David Jacot <dja...@confluent.io > > > > > wrote: > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > 1.1. I think the state machine are: "Empty, assigning, > > > reconciling, > > > > > > stable, > > > > > > > > dead" mentioned in Consumer Group States section, right? > > > > > > > > > > > > > > This sentence does not refer to those group states but rather > to a > > > > > > > state machine replication (SMR). This refers to the entire > state of > > > > > > > group coordinator which is replicated via the log layer. I will > > > > > > > clarify this in the KIP. > > > > > > > > > > > > > > > 1.2. What do you mean "each state machine is modelled as an > event > > > > > > loop"? > > > > > > > > > > > > > > The idea is to follow a model similar to the new quorum > > > controller. We > > > > > > > will have N threads to process events. Each __consumer_offsets > > > > > > > partition is assigned to a unique thread and all the events > (e.g. > > > > > > > requests, callbacks, etc.) are processed by this thread. This > > > simplify > > > > > > > concurrency and will enable us to do simulation testing for the > > > group > > > > > > > coordinator. > > > > > > > > > > > > > > > 1.3. Why do we need a state machine per *__consumer_offsets* > > > > > > partitions? > > > > > > > > Not a state machine "per consumer group" owned by a group > > > coordinator? > > > > > > For > > > > > > > > example, if one group coordinator owns 2 consumer groups, and > > > both > > > > > > exist in > > > > > > > > *__consumer_offsets-0*, will we have 1 state machine for it, > or > > > 2? > > > > > > > > > > > > > > See 1.1. The confusion comes from there, I think. > > > > > > > > > > > > > > > 1.4. I know the "*group.coordinator.threads" *is the number > of > > > threads > > > > > > used > > > > > > > > to run the state machines. But I'm wondering if the purpose > of > > > the > > > > > > threads > > > > > > > > is only to keep the state of each consumer group (or > > > > > > *__consumer_offsets* > > > > > > > > partitions?), and no heavy computation, why should we need > > > > > > multi-threads > > > > > > > > here? > > > > > > > > > > > > > > See 1.2. The idea is to have an ability to shard the > processing as > > > the > > > > > > > computation could be heavy. > > > > > > > > > > > > > > > 2.1. The consumer session timeout, why does the default > session > > > > > > timeout not > > > > > > > > locate between min (45s) and max(60s)? I thought the min/max > > > session > > > > > > > > timeout is to define lower/upper bound of it, no? > > > > > > > > > > > > > > > > group.consumer.session.timeout.ms int 30s The timeout to > detect > > > client > > > > > > > > failures when using the consumer group protocol. > > > > > > > > group.consumer.min.session.timeout.ms int 45s The minimum > > > session > > > > > > timeout. > > > > > > > > group.consumer.max.session.timeout.ms int 60s The maximum > > > session > > > > > > timeout. > > > > > > > > > > > > > > This is indeed a mistake. The default session timeout should > be 45s > > > > > > > (the current default). > > > > > > > > > > > > > > > 2.2. The default server side assignor are [range, uniform], > > > which means > > > > > > > > we'll default to "range" assignor. I'd like to know why not > > > uniform > > > > > > one? I > > > > > > > > thought usually users will choose uniform assignor (former > sticky > > > > > > assinor) > > > > > > > > for better evenly distribution. Any other reason we choose > range > > > > > > assignor > > > > > > > > as default? > > > > > > > > group.consumer.assignors List range, uniform The server side > > > assignors. > > > > > > > > > > > > > > The order on the server side has no influence because the > client > > > must > > > > > > > chose the selector that he wants to use. There is no default > in the > > > > > > > current proposal. If the assignor is not specified by the > client, > > > the > > > > > > > request is rejected. The default client value for > > > > > > > `group.remote.assignor` is `uniform` though. > > > > > > > > > > > > > > Thanks for your very good comments, Luke. I hope that my > answers > > > help > > > > > > > to clarify things. I will update the KIP as well based on your > > > > > > > feedback. > > > > > > > > > > > > > > Cheers, > > > > > > > David > > > > > > > > > > > > > > On Mon, Aug 22, 2022 at 9:29 AM Luke Chen <show...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > Thanks for the update. > > > > > > > > > > > > > > > > Some more questions: > > > > > > > > 1. In Group Coordinator section, you mentioned: > > > > > > > > > The new group coordinator will have a state machine per > > > > > > > > *__consumer_offsets* partitions, where each state machine is > > > modelled > > > > > > as an > > > > > > > > event loop. Those state machines will be executed in > > > > > > > > *group.coordinator.threads* threads. > > > > > > > > > > > > > > > > 1.1. I think the state machine are: "Empty, assigning, > > > reconciling, > > > > > > stable, > > > > > > > > dead" mentioned in Consumer Group States section, right? > > > > > > > > 1.2. What do you mean "each state machine is modelled as an > event > > > > > > loop"? > > > > > > > > 1.3. Why do we need a state machine per *__consumer_offsets* > > > > > > partitions? > > > > > > > > Not a state machine "per consumer group" owned by a group > > > coordinator? > > > > > > For > > > > > > > > example, if one group coordinator owns 2 consumer groups, and > > > both > > > > > > exist in > > > > > > > > *__consumer_offsets-0*, will we have 1 state machine for it, > or > > > 2? > > > > > > > > 1.4. I know the "*group.coordinator.threads" *is the number > of > > > threads > > > > > > used > > > > > > > > to run the state machines. But I'm wondering if the purpose > of > > > the > > > > > > threads > > > > > > > > is only to keep the state of each consumer group (or > > > > > > *__consumer_offsets* > > > > > > > > partitions?), and no heavy computation, why should we need > > > > > > multi-threads > > > > > > > > here? > > > > > > > > > > > > > > > > 2. For the default value in the new configs: > > > > > > > > 2.1. The consumer session timeout, why does the default > session > > > > > > timeout not > > > > > > > > locate between min (45s) and max(60s)? I thought the min/max > > > session > > > > > > > > timeout is to define lower/upper bound of it, no? > > > > > > > > > > > > > > > > group.consumer.session.timeout.ms int 30s The timeout to > detect > > > client > > > > > > > > failures when using the consumer group protocol. > > > > > > > > group.consumer.min.session.timeout.ms int 45s The minimum > > > session > > > > > > timeout. > > > > > > > > group.consumer.max.session.timeout.ms int 60s The maximum > > > session > > > > > > timeout. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2.2. The default server side assignor are [range, uniform], > > > which means > > > > > > > > we'll default to "range" assignor. I'd like to know why not > > > uniform > > > > > > one? I > > > > > > > > thought usually users will choose uniform assignor (former > sticky > > > > > > assinor) > > > > > > > > for better evenly distribution. Any other reason we choose > range > > > > > > assignor > > > > > > > > as default? > > > > > > > > group.consumer.assignors List range, uniform The server side > > > assignors. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you. > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 22, 2022 at 2:10 PM Luke Chen <show...@gmail.com > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Sagar, > > > > > > > > > > > > > > > > > > I have some thoughts about Kafka Connect integrating with > > > KIP-848, > > > > > > but I > > > > > > > > > think we should have a separate discussion thread for the > Kafka > > > > > > Connect > > > > > > > > > KIP: Integrating Kafka Connect With New Consumer Rebalance > > > Protocol > > > > > > [1], > > > > > > > > > and let this discussion thread focus on consumer rebalance > > > protocol, > > > > > > WDYT? > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/%5BDRAFT%5DIntegrating+Kafka+Connect+With+New+Consumer+Rebalance+Protocol > > > > > > > > > > > > > > > > > > Thank you. > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Aug 12, 2022 at 9:31 PM Sagar < > > > sagarmeansoc...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > >> Thank you Guozhang/David for the feedback. Looks like > there's > > > > > > agreement on > > > > > > > > >> using separate APIs for Connect. I would revisit the doc > and > > > see > > > > > > what > > > > > > > > >> changes are to be made. > > > > > > > > >> > > > > > > > > >> Thanks! > > > > > > > > >> Sagar. > > > > > > > > >> > > > > > > > > >> On Tue, Aug 9, 2022 at 7:11 PM David Jacot > > > > > > <dja...@confluent.io.invalid> > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > >> > Hi Sagar, > > > > > > > > >> > > > > > > > > > >> > Thanks for the feedback and the document. That's really > > > helpful. I > > > > > > > > >> > will take a look at it. > > > > > > > > >> > > > > > > > > > >> > Overall, it seems to me that both Connect and the > Consumer > > > could > > > > > > share > > > > > > > > >> > the same underlying "engine". The main difference is > that > > > the > > > > > > Consumer > > > > > > > > >> > assigns topic-partitions to members whereas Connect > assigns > > > tasks > > > > > > to > > > > > > > > >> > workers. I see two ways to move forward: > > > > > > > > >> > 1) We extend the new proposed APIs to support different > > > resource > > > > > > types > > > > > > > > >> > (e.g. partitions, tasks, etc.); or > > > > > > > > >> > 2) We use new dedicated APIs for Connect. The dedicated > APIs > > > > > > would be > > > > > > > > >> > similar to the new ones but different on the > > > content/resources and > > > > > > > > >> > they would rely on the same engine on the coordinator > side. > > > > > > > > >> > > > > > > > > > >> > I personally lean towards 2) because I am not a fan of > > > > > > overcharging > > > > > > > > >> > APIs to serve different purposes. That being said, I am > not > > > > > > opposed to > > > > > > > > >> > 1) if we can find an elegant way to do it. > > > > > > > > >> > > > > > > > > > >> > I think that we can continue to discuss it here for now > in > > > order > > > > > > to > > > > > > > > >> > ensure that this KIP is compatible with what we will do > for > > > > > > Connect in > > > > > > > > >> > the future. > > > > > > > > >> > > > > > > > > > >> > Best, > > > > > > > > >> > David > > > > > > > > >> > > > > > > > > > >> > On Mon, Aug 8, 2022 at 2:41 PM David Jacot < > > > dja...@confluent.io> > > > > > > wrote: > > > > > > > > >> > > > > > > > > > > >> > > Hi all, > > > > > > > > >> > > > > > > > > > > >> > > I am back from vacation. I will go through and address > > > your > > > > > > comments > > > > > > > > >> > > in the coming days. Thanks for your feedback. > > > > > > > > >> > > > > > > > > > > >> > > Cheers, > > > > > > > > >> > > David > > > > > > > > >> > > > > > > > > > > >> > > On Wed, Aug 3, 2022 at 10:05 PM Gregory Harris < > > > > > > gharris1...@gmail.com > > > > > > > > >> > > > > > > > > > >> > wrote: > > > > > > > > >> > > > > > > > > > > > >> > > > Hey All! > > > > > > > > >> > > > > > > > > > > > >> > > > Thanks for the KIP, it's wonderful to see > cooperative > > > > > > rebalancing > > > > > > > > >> > making it > > > > > > > > >> > > > down the stack! > > > > > > > > >> > > > > > > > > > > > >> > > > I had a few questions: > > > > > > > > >> > > > > > > > > > > > >> > > > 1. The 'Rejected Alternatives' section describes how > > > member > > > > > > epoch > > > > > > > > >> > should > > > > > > > > >> > > > advance in step with the group epoch and assignment > > > epoch > > > > > > values. I > > > > > > > > >> > think > > > > > > > > >> > > > that this is a good idea for the reasons described > in > > > the > > > > > > KIP. When > > > > > > > > >> the > > > > > > > > >> > > > protocol is incrementally assigning partitions to a > > > worker, > > > > > > what > > > > > > > > >> member > > > > > > > > >> > > > epoch does each incremental assignment use? Are > member > > > epochs > > > > > > > > >> re-used, > > > > > > > > >> > and > > > > > > > > >> > > > a single member epoch can correspond to multiple > > > different > > > > > > > > >> > (monotonically > > > > > > > > >> > > > larger) assignments? > > > > > > > > >> > > > > > > > > > > > >> > > > 2. Is the Assignor's 'Reason' field opaque to the > group > > > > > > > > >> coordinator? If > > > > > > > > >> > > > not, should custom client-side assignor > implementations > > > > > > interact > > > > > > > > >> with > > > > > > > > >> > the > > > > > > > > >> > > > Reason field, and how is its common meaning agreed > > > upon? If > > > > > > so, what > > > > > > > > >> > is the > > > > > > > > >> > > > benefit of a distinct Reason field over including > such > > > > > > functionality > > > > > > > > >> > in the > > > > > > > > >> > > > opaque metadata? > > > > > > > > >> > > > > > > > > > > > >> > > > 3. The following is included in the KIP: "Thanks to > > > this, the > > > > > > input > > > > > > > > >> of > > > > > > > > >> > the > > > > > > > > >> > > > client side assignor is entirely driven by the group > > > > > > coordinator. > > > > > > > > >> The > > > > > > > > >> > > > consumer is no longer responsible for maintaining > any > > > state > > > > > > besides > > > > > > > > >> its > > > > > > > > >> > > > assigned partitions." Does this mean that the > > > client-side > > > > > > assignor > > > > > > > > >> MAY > > > > > > > > >> > > > incorporate additional non-Metadata state (such as > > > partition > > > > > > > > >> > throughput, > > > > > > > > >> > > > cpu/memory metrics, config topics, etc), or that > > > additional > > > > > > > > >> > non-Metadata > > > > > > > > >> > > > state SHOULD NOT be used? > > > > > > > > >> > > > > > > > > > > > >> > > > 4. I see that there are separate classes > > > > > > > > >> > > > for > > > org.apache.kafka.server.group.consumer.PartitionAssignor > > > > > > > > >> > > > and > org.apache.kafka.clients.consumer.PartitionAssignor > > > that > > > > > > seem to > > > > > > > > >> > > > overlap significantly. Is it possible for these two > > > > > > implementations > > > > > > > > >> to > > > > > > > > >> > be > > > > > > > > >> > > > unified? This would serve to promote feature parity > of > > > > > > server-side > > > > > > > > >> and > > > > > > > > >> > > > client-side assignors, and would also facilitate > > > operational > > > > > > > > >> > flexibility in > > > > > > > > >> > > > certain situations. For example, if a server-side > > > assignor > > > > > > has some > > > > > > > > >> > poor > > > > > > > > >> > > > behavior and needs a patch, deploying the patched > > > assignor to > > > > > > the > > > > > > > > >> > client > > > > > > > > >> > > > and switching one consumer group to a client-side > > > assignor > > > > > > may be > > > > > > > > >> > faster > > > > > > > > >> > > > and less risky than patching all of the brokers. > With > > > the > > > > > > currently > > > > > > > > >> > > > proposed distinct APIs, a non-trivial > reimplementation > > > would > > > > > > have > > > > > > > > >> to be > > > > > > > > >> > > > assembled, and if the two APIs have diverged > > > significantly, > > > > > > then it > > > > > > > > >> is > > > > > > > > >> > > > possible that a reimplementation would not be > possible. > > > > > > > > >> > > > > > > > > > > > >> > > > -- > > > > > > > > >> > > > Greg Harris > > > > > > > > >> > > > gharris1...@gmail.com > > > > > > > > >> > > > github.com/gharris1727 > > > > > > > > >> > > > > > > > > > > > >> > > > On Wed, Aug 3, 2022 at 8:39 AM Sagar < > > > > > > sagarmeansoc...@gmail.com> > > > > > > > > >> > wrote: > > > > > > > > >> > > > > > > > > > > > >> > > > > Hi Guozhang/David, > > > > > > > > >> > > > > > > > > > > > > >> > > > > I created a confluence page to discuss how Connect > > > would > > > > > > need to > > > > > > > > >> > change > > > > > > > > >> > > > > based on the new rebalance protocol. Here's the > page: > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/%5BDRAFT%5DIntegrating+Kafka+Connect+With+New+Consumer+Rebalance+Protocol > > > > > > > > >> > > > > > > > > > > > > >> > > > > It's also pretty longish and I have tried to keep > a > > > format > > > > > > > > >> similar to > > > > > > > > >> > > > > KIP-848. Let me know what you think. Also, do you > > > think this > > > > > > > > >> should > > > > > > > > >> > be > > > > > > > > >> > > > > moved to a separate discussion thread or is this > one > > > fine? > > > > > > > > >> > > > > > > > > > > > > >> > > > > Thanks! > > > > > > > > >> > > > > Sagar. > > > > > > > > >> > > > > > > > > > > > > >> > > > > On Tue, Jul 26, 2022 at 7:37 AM Sagar < > > > > > > sagarmeansoc...@gmail.com> > > > > > > > > >> > wrote: > > > > > > > > >> > > > > > > > > > > > > >> > > > > > Hello Guozhang, > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > Thank you so much for the doc on Kafka Streams. > > > Sure, I > > > > > > would do > > > > > > > > >> > the > > > > > > > > >> > > > > > analysis and come up with such a document. > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > Thanks! > > > > > > > > >> > > > > > Sagar. > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > On Tue, Jul 26, 2022 at 4:47 AM Guozhang Wang < > > > > > > > > >> wangg...@gmail.com> > > > > > > > > >> > > > > wrote: > > > > > > > > >> > > > > > > > > > > > > > >> > > > > >> Hello Sagar, > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> It would be great if you could come back with > some > > > > > > analysis on > > > > > > > > >> > how to > > > > > > > > >> > > > > >> implement the Connect side integration with > the new > > > > > > protocol; > > > > > > > > >> so > > > > > > > > >> > far > > > > > > > > >> > > > > >> besides leveraging on the new "protocol type" > we > > > did not > > > > > > yet > > > > > > > > >> think > > > > > > > > >> > > > > through > > > > > > > > >> > > > > >> the Connect side implementations. For Streams > > > here's a > > > > > > draft of > > > > > > > > >> > > > > >> integration > > > > > > > > >> > > > > >> plan: > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > https://docs.google.com/document/d/17PNz2sGoIvGyIzz8vLyJTJTU2rqnD_D9uHJnH9XARjU/edit#heading=h.pdgirmi57dvn > > > > > > > > >> > > > > >> just FYI for your analysis on Connect. > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> On Tue, Jul 19, 2022 at 10:48 PM Sagar < > > > > > > > > >> sagarmeansoc...@gmail.com > > > > > > > > >> > > > > > > > > > > >> > > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > Hi David, > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > Thank you for your response. The reason I > thought > > > > > > connect can > > > > > > > > >> > also fit > > > > > > > > >> > > > > >> into > > > > > > > > >> > > > > >> > this new scheme is that even today the > connect > > > uses a > > > > > > > > >> > > > > WorkerCoordinator > > > > > > > > >> > > > > >> > extending from AbstractCoordinator to empower > > > > > > rebalances of > > > > > > > > >> > > > > >> > tasks/connectors. The WorkerCoordinator sets > the > > > > > > > > >> protocolType() > > > > > > > > >> > to > > > > > > > > >> > > > > >> connect > > > > > > > > >> > > > > >> > and uses the metadata() method by plumbing > into > > > > > > > > >> > > > > >> JoinGroupRequestProtocol. > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > I think the changes to support connect would > be > > > > > > similar at a > > > > > > > > >> > high > > > > > > > > >> > > > > level > > > > > > > > >> > > > > >> to > > > > > > > > >> > > > > >> > the changes in streams mainly because of the > > > Client > > > > > > side > > > > > > > > >> > assignors > > > > > > > > >> > > > > being > > > > > > > > >> > > > > >> > used in both. At an implementation level, we > > > might > > > > > > need to > > > > > > > > >> make > > > > > > > > >> > a lot > > > > > > > > >> > > > > of > > > > > > > > >> > > > > >> > changes to get onto this new assignment > protocol > > > like > > > > > > > > >> enhancing > > > > > > > > >> > the > > > > > > > > >> > > > > >> > JoinGroup request/response and SyncGroup and > > > using > > > > > > > > >> > > > > >> ConsumerGroupHeartbeat > > > > > > > > >> > > > > >> > API etc again on similar lines to streams (or > > > there > > > > > > might be > > > > > > > > >> > > > > >> deviations). I > > > > > > > > >> > > > > >> > would try to perform a detailed analysis of > the > > > same > > > > > > and we > > > > > > > > >> > can have > > > > > > > > >> > > > > a > > > > > > > > >> > > > > >> > separate discussion thread for that as that > would > > > > > > derail this > > > > > > > > >> > > > > discussion > > > > > > > > >> > > > > >> > thread. Let me know if that sounds good to > you. > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > Thanks! > > > > > > > > >> > > > > >> > Sagar. > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > On Fri, Jul 15, 2022 at 5:47 PM David Jacot > > > > > > > > >> > > > > <dja...@confluent.io.invalid > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > Hi Sagar, > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > Thanks for your comments. > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > 1) Yes. That refers to `Assignment#error`. > > > Sure, I > > > > > > can > > > > > > > > >> > mention it. > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > 2) The idea is to transition C from his > current > > > > > > assignment > > > > > > > > >> to > > > > > > > > >> > his > > > > > > > > >> > > > > >> > > target assignment when he can move to > epoch 3. > > > When > > > > > > that > > > > > > > > >> > happens, > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > member assignment is updated and persisted > > > with all > > > > > > its > > > > > > > > >> > assigned > > > > > > > > >> > > > > >> > > partitions even if they are not all revoked > > > yet. In > > > > > > other > > > > > > > > >> > words, the > > > > > > > > >> > > > > >> > > member assignment becomes the target > > > assignment. > > > > > > This is > > > > > > > > >> > basically > > > > > > > > >> > > > > an > > > > > > > > >> > > > > >> > > optimization to avoid having to write all > the > > > > > > changes to > > > > > > > > >> the > > > > > > > > >> > log. > > > > > > > > >> > > > > The > > > > > > > > >> > > > > >> > > examples are based on the persisted state > so I > > > > > > understand > > > > > > > > >> the > > > > > > > > >> > > > > >> > > confusion. Let me see if I can improve > this in > > > the > > > > > > > > >> > description. > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > 3) Regarding Connect, it could reuse the > > > protocol > > > > > > with a > > > > > > > > >> > client side > > > > > > > > >> > > > > >> > > assignor if it fits in the protocol. The > > > assignment > > > > > > is > > > > > > > > >> about > > > > > > > > >> > > > > >> > > topicid-partitions + metadata, could > Connect > > > fit > > > > > > into this? > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > Best, > > > > > > > > >> > > > > >> > > David > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > On Fri, Jul 15, 2022 at 1:55 PM Sagar < > > > > > > > > >> > sagarmeansoc...@gmail.com> > > > > > > > > >> > > > > >> wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > Hi David, > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > Thanks for the KIP. I just had minor > > > observations: > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > 1) In the Assignment Error section in > Client > > > Side > > > > > > mode > > > > > > > > >> > Assignment > > > > > > > > >> > > > > >> > > process, > > > > > > > > >> > > > > >> > > > you mentioned => `In this case, the > client > > > side > > > > > > assignor > > > > > > > > >> can > > > > > > > > >> > > > > return > > > > > > > > >> > > > > >> an > > > > > > > > >> > > > > >> > > > error to the group coordinator`. In this > > > case are > > > > > > you > > > > > > > > >> > referring to > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > Assignor returning an AssignmentError > that's > > > > > > listed down > > > > > > > > >> > towards > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > end? > > > > > > > > >> > > > > >> > > > If yes, do you think it would make sense > to > > > > > > mention this > > > > > > > > >> > > > > explicitly > > > > > > > > >> > > > > >> > here? > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > 2) In the Case Studies section, I have a > > > slight > > > > > > > > >> confusion, > > > > > > > > >> > not > > > > > > > > >> > > > > sure > > > > > > > > >> > > > > >> if > > > > > > > > >> > > > > >> > > > others have the same. Consider this step: > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > When B heartbeats, the group coordinator > > > > > > transitions him > > > > > > > > >> to > > > > > > > > >> > epoch > > > > > > > > >> > > > > 3 > > > > > > > > >> > > > > >> > > because > > > > > > > > >> > > > > >> > > > B has no partitions to revoke. It > persists > > > the > > > > > > change and > > > > > > > > >> > reply. > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > - Group (epoch=3) > > > > > > > > >> > > > > >> > > > - A > > > > > > > > >> > > > > >> > > > - B > > > > > > > > >> > > > > >> > > > - C > > > > > > > > >> > > > > >> > > > - Target Assignment (epoch=3) > > > > > > > > >> > > > > >> > > > - A - partitions=[foo-0] > > > > > > > > >> > > > > >> > > > - B - partitions=[foo-2] > > > > > > > > >> > > > > >> > > > - C - partitions=[foo-1] > > > > > > > > >> > > > > >> > > > - Member Assignment > > > > > > > > >> > > > > >> > > > - A - epoch=2, partitions=[foo-0, > > > foo-1] > > > > > > > > >> > > > > >> > > > - B - epoch=3, partitions=[foo-2] > > > > > > > > >> > > > > >> > > > - C - epoch=3, partitions=[foo-1] > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > When C heartbeats, it transitions to > epoch 3 > > > but > > > > > > cannot > > > > > > > > >> get > > > > > > > > >> > foo-1 > > > > > > > > >> > > > > >> yet. > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > Here,it's mentioned that member C can't > get > > > the > > > > > > foo-1 > > > > > > > > >> > partition > > > > > > > > >> > > > > yet, > > > > > > > > >> > > > > >> > but > > > > > > > > >> > > > > >> > > > based on the description above, it seems > it > > > > > > already has > > > > > > > > >> it. > > > > > > > > >> > Do you > > > > > > > > >> > > > > >> > think > > > > > > > > >> > > > > >> > > it > > > > > > > > >> > > > > >> > > > would be better to remove it and > populate it > > > only > > > > > > when it > > > > > > > > >> > actually > > > > > > > > >> > > > > >> gets > > > > > > > > >> > > > > >> > > it? > > > > > > > > >> > > > > >> > > > I see this in a lot of other places, so > have > > > I > > > > > > > > >> understood it > > > > > > > > >> > > > > >> > incorrectly > > > > > > > > >> > > > > >> > > ? > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > Regarding connect , it might be out of > scope > > > of > > > > > > this > > > > > > > > >> > discussion, > > > > > > > > >> > > > > but > > > > > > > > >> > > > > >> > from > > > > > > > > >> > > > > >> > > > what I understood it would probably be > > > running in > > > > > > client > > > > > > > > >> > side > > > > > > > > >> > > > > >> assignor > > > > > > > > >> > > > > >> > > mode > > > > > > > > >> > > > > >> > > > even on the new rebalance protocol as it > has > > > its > > > > > > own > > > > > > > > >> Custom > > > > > > > > >> > > > > >> > > Assignors(Eager > > > > > > > > >> > > > > >> > > > and IncrementalCooperative). > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > Thanks! > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > Sagar. > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > On Fri, Jul 15, 2022 at 5:00 PM David > Jacot > > > > > > > > >> > > > > >> > <dja...@confluent.io.invalid > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > Thanks Hector! Our goal is to move > forward > > > with > > > > > > > > >> > specialized API > > > > > > > > >> > > > > >> > > > > instead of relying on one generic API. > For > > > > > > Connect, we > > > > > > > > >> > can apply > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > exact same pattern and reuse/share the > core > > > > > > > > >> > implementation on > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > server side. For the schema registry, I > > > think > > > > > > that we > > > > > > > > >> > should > > > > > > > > >> > > > > >> consider > > > > > > > > >> > > > > >> > > > > having a tailored API to do simple > > > > > > membership/leader > > > > > > > > >> > election. > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > Best, > > > > > > > > >> > > > > >> > > > > David > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > On Fri, Jul 15, 2022 at 10:22 AM Ismael > > > Juma < > > > > > > > > >> > ism...@juma.me.uk > > > > > > > > >> > > > > > > > > > > > > > >> > > > > >> > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > Three quick comments: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > 1. Regarding java.util.regex.Pattern > vs > > > > > > > > >> > > > > >> com.google.re2j.Pattern, we > > > > > > > > >> > > > > >> > > > > should > > > > > > > > >> > > > > >> > > > > > document the differences in more > detail > > > before > > > > > > > > >> deciding > > > > > > > > >> > one > > > > > > > > >> > > > > way > > > > > > > > >> > > > > >> or > > > > > > > > >> > > > > >> > > > > another. > > > > > > > > >> > > > > >> > > > > > That said, if people pass > > > > > > java.util.regex.Pattern, > > > > > > > > >> they > > > > > > > > >> > expect > > > > > > > > >> > > > > >> > their > > > > > > > > >> > > > > >> > > > > > semantics to be honored. If we are > doing > > > > > > something > > > > > > > > >> > different, > > > > > > > > >> > > > > >> then > > > > > > > > >> > > > > >> > we > > > > > > > > >> > > > > >> > > > > > should consider adding an overload > with > > > our own > > > > > > > > >> Pattern > > > > > > > > >> > class > > > > > > > > >> > > > > (I > > > > > > > > >> > > > > >> > > don't > > > > > > > > >> > > > > >> > > > > > think we'd want to expose re2j's at > this > > > > > > point). > > > > > > > > >> > > > > >> > > > > > 2. Regarding topic ids, any major new > > > protocol > > > > > > should > > > > > > > > >> > > > > integrate > > > > > > > > >> > > > > >> > fully > > > > > > > > >> > > > > >> > > > > with > > > > > > > > >> > > > > >> > > > > > it and should handle the topic > > > recreation case > > > > > > > > >> > correctly. > > > > > > > > >> > > > > That's > > > > > > > > >> > > > > >> > the > > > > > > > > >> > > > > >> > > main > > > > > > > > >> > > > > >> > > > > > part we need to handle. I agree with > > > David > > > > > > that we'd > > > > > > > > >> > want to > > > > > > > > >> > > > > add > > > > > > > > >> > > > > >> > > topic > > > > > > > > >> > > > > >> > > > > ids > > > > > > > > >> > > > > >> > > > > > to the relevant protocols that don't > > > have it > > > > > > yet and > > > > > > > > >> > that we > > > > > > > > >> > > > > can > > > > > > > > >> > > > > >> > > probably > > > > > > > > >> > > > > >> > > > > > focus on the internals versus adding > new > > > APIs > > > > > > to the > > > > > > > > >> > Java > > > > > > > > >> > > > > >> Consumer > > > > > > > > >> > > > > >> > > > > (unless > > > > > > > > >> > > > > >> > > > > > we find that adding new APIs is > required > > > for > > > > > > > > >> reasonable > > > > > > > > >> > > > > >> semantics). > > > > > > > > >> > > > > >> > > > > > 3. I am still not sure about the > > > coordinator > > > > > > storing > > > > > > > > >> the > > > > > > > > >> > > > > >> configs. > > > > > > > > >> > > > > >> > > It's > > > > > > > > >> > > > > >> > > > > > powerful for configs to be > centralized > > > in the > > > > > > > > >> metadata > > > > > > > > >> > log for > > > > > > > > >> > > > > >> > > various > > > > > > > > >> > > > > >> > > > > > reasons (auditability, visibility, > > > consistency, > > > > > > > > >> etc.). > > > > > > > > >> > > > > >> Similarly, I > > > > > > > > >> > > > > >> > > am > > > > > > > > >> > > > > >> > > > > not > > > > > > > > >> > > > > >> > > > > > sure about automatically deleting > > > configs in a > > > > > > way > > > > > > > > >> that > > > > > > > > >> > they > > > > > > > > >> > > > > >> cannot > > > > > > > > >> > > > > >> > > be > > > > > > > > >> > > > > >> > > > > > recovered. A good property for modern > > > systems > > > > > > is to > > > > > > > > >> > minimize > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > number > > > > > > > > >> > > > > >> > > > > of > > > > > > > > >> > > > > >> > > > > > unrecoverable data loss scenarios. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > Ismael > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > On Wed, Jul 13, 2022 at 3:47 PM David > > > Jacot > > > > > > > > >> > > > > >> > > <dja...@confluent.io.invalid > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > >> > > > > > > Thanks Guozhang. My answers are > below: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > 1) the migration path, especially > > > the last > > > > > > step > > > > > > > > >> when > > > > > > > > >> > > > > clients > > > > > > > > >> > > > > >> > > flip the > > > > > > > > >> > > > > >> > > > > > > flag > > > > > > > > >> > > > > >> > > > > > > > to enable the new protocol, in > which > > > we > > > > > > would > > > > > > > > >> have a > > > > > > > > >> > > > > window > > > > > > > > >> > > > > >> > where > > > > > > > > >> > > > > >> > > > > both > > > > > > > > >> > > > > >> > > > > > > new > > > > > > > > >> > > > > >> > > > > > > > protocols / rpcs and old > protocols / > > > rpcs > > > > > > are > > > > > > > > >> used > > > > > > > > >> > by > > > > > > > > >> > > > > >> members > > > > > > > > >> > > > > >> > of > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > same > > > > > > > > >> > > > > >> > > > > > > > group. How the coordinator could > > > "mimic" > > > > > > the old > > > > > > > > >> > behavior > > > > > > > > >> > > > > >> while > > > > > > > > >> > > > > >> > > > > using the > > > > > > > > >> > > > > >> > > > > > > > new protocol is something we > need to > > > > > > present > > > > > > > > >> about. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > Noted. I just published a new > version > > > of KIP > > > > > > which > > > > > > > > >> > includes > > > > > > > > >> > > > > >> more > > > > > > > > >> > > > > >> > > > > > > details about this. See the > "Supporting > > > > > > Online > > > > > > > > >> > Consumer > > > > > > > > >> > > > > Group > > > > > > > > >> > > > > >> > > Upgrade" > > > > > > > > >> > > > > >> > > > > > > and the "Compatibility, > Deprecation, > > > and > > > > > > Migration > > > > > > > > >> > Plan". I > > > > > > > > >> > > > > >> think > > > > > > > > >> > > > > >> > > that > > > > > > > > >> > > > > >> > > > > > > I have to think through a few cases > > > now but > > > > > > the > > > > > > > > >> > overall idea > > > > > > > > >> > > > > >> and > > > > > > > > >> > > > > >> > > > > > > mechanism should be understandable. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > 2) the usage of topic ids. So > far as > > > > > > KIP-516 the > > > > > > > > >> > topic ids > > > > > > > > >> > > > > >> are > > > > > > > > >> > > > > >> > > only > > > > > > > > >> > > > > >> > > > > used > > > > > > > > >> > > > > >> > > > > > > as > > > > > > > > >> > > > > >> > > > > > > > part of RPCs and admin client, > but > > > they > > > > > > are not > > > > > > > > >> > exposed > > > > > > > > >> > > > > via > > > > > > > > >> > > > > >> any > > > > > > > > >> > > > > >> > > > > public > > > > > > > > >> > > > > >> > > > > > > APIs > > > > > > > > >> > > > > >> > > > > > > > to consumers yet. I think the > > > question is, > > > > > > first > > > > > > > > >> > should we > > > > > > > > >> > > > > >> let > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > consumer > > > > > > > > >> > > > > >> > > > > > > > client to be maintaining the > names > > > -> ids > > > > > > mapping > > > > > > > > >> > itself > > > > > > > > >> > > > > to > > > > > > > > >> > > > > >> > fully > > > > > > > > >> > > > > >> > > > > > > leverage > > > > > > > > >> > > > > >> > > > > > > > on all the augmented existing > RPCs > > > and the > > > > > > new > > > > > > > > >> RPCs > > > > > > > > >> > with > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > topic > > > > > > > > >> > > > > >> > > > > ids; > > > > > > > > >> > > > > >> > > > > > > and > > > > > > > > >> > > > > >> > > > > > > > secondly, should we ever consider > > > exposing > > > > > > the > > > > > > > > >> > topic ids > > > > > > > > >> > > > > in > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > consumer > > > > > > > > >> > > > > >> > > > > > > > public APIs as well (both > > > > > > subscribe/assign, as > > > > > > > > >> well > > > > > > > > >> > as in > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > rebalance > > > > > > > > >> > > > > >> > > > > > > > listener for cases like topic > > > > > > > > >> > deletion-and-recreation). > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > a) Assuming that we would include > > > converting > > > > > > all > > > > > > > > >> the > > > > > > > > >> > offsets > > > > > > > > >> > > > > >> > > related > > > > > > > > >> > > > > >> > > > > > > RPCs to using topic ids in this > KIP, > > > the > > > > > > consumer > > > > > > > > >> > would be > > > > > > > > >> > > > > >> able > > > > > > > > >> > > > > >> > to > > > > > > > > >> > > > > >> > > > > > > fully operate with topic ids. That > > > being > > > > > > said, it > > > > > > > > >> > still has > > > > > > > > >> > > > > to > > > > > > > > >> > > > > >> > > provide > > > > > > > > >> > > > > >> > > > > > > the topics names in various APIs so > > > having a > > > > > > > > >> mapping > > > > > > > > >> > in the > > > > > > > > >> > > > > >> > > consumer > > > > > > > > >> > > > > >> > > > > > > seems inevitable to me. > > > > > > > > >> > > > > >> > > > > > > b) I don't have a strong opinion on > > > this. > > > > > > Here I > > > > > > > > >> > wonder if > > > > > > > > >> > > > > >> this > > > > > > > > >> > > > > >> > > goes > > > > > > > > >> > > > > >> > > > > > > beyond the scope of this KIP. I > would > > > rather > > > > > > focus > > > > > > > > >> on > > > > > > > > >> > the > > > > > > > > >> > > > > >> > internals > > > > > > > > >> > > > > >> > > > > > > here and we can consider this > > > separately if > > > > > > we see > > > > > > > > >> > value in > > > > > > > > >> > > > > >> doing > > > > > > > > >> > > > > >> > > it. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > Coming back to Ismael's point about > > > using > > > > > > topic ids > > > > > > > > >> > in the > > > > > > > > >> > > > > >> > > > > > > ConsumerGroupHeartbeatRequest, I > think > > > that > > > > > > there > > > > > > > > >> is > > > > > > > > >> > one > > > > > > > > >> > > > > >> > advantage > > > > > > > > >> > > > > >> > > in > > > > > > > > >> > > > > >> > > > > > > favour of it. The consumer will > have > > > the > > > > > > > > >> opportunity > > > > > > > > >> > to > > > > > > > > >> > > > > >> validate > > > > > > > > >> > > > > >> > > that > > > > > > > > >> > > > > >> > > > > > > the topics exists before passing > them > > > into > > > > > > the > > > > > > > > >> group > > > > > > > > >> > > > > rebalance > > > > > > > > >> > > > > >> > > > > > > protocol. Obviously, the > coordinator > > > will > > > > > > also > > > > > > > > >> notice > > > > > > > > >> > it but > > > > > > > > >> > > > > >> it > > > > > > > > >> > > > > >> > > does > > > > > > > > >> > > > > >> > > > > > > not really have a way to reject an > > > invalid > > > > > > topic in > > > > > > > > >> > the > > > > > > > > >> > > > > >> response. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > I'm agreeing with David on all > other > > > minor > > > > > > > > >> questions > > > > > > > > >> > > > > except > > > > > > > > >> > > > > >> for > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > > `subscribe(Pattern)` question: > > > personally > > > > > > I think > > > > > > > > >> > it's not > > > > > > > > >> > > > > >> > > necessary > > > > > > > > >> > > > > >> > > > > to > > > > > > > > >> > > > > >> > > > > > > > deprecate the subscribe API with > > > Pattern, > > > > > > but > > > > > > > > >> > instead we > > > > > > > > >> > > > > >> still > > > > > > > > >> > > > > >> > > use > > > > > > > > >> > > > > >> > > > > > > Pattern > > > > > > > > >> > > > > >> > > > > > > > while just documenting that our > > > > > > subscription may > > > > > > > > >> be > > > > > > > > >> > > > > >> rejected by > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > server. > > > > > > > > >> > > > > >> > > > > > > > Since the incompatible case is a > > > very rare > > > > > > > > >> scenario > > > > > > > > >> > I felt > > > > > > > > >> > > > > >> > using > > > > > > > > >> > > > > >> > > an > > > > > > > > >> > > > > >> > > > > > > > overloaded `String` based > > > subscription may > > > > > > be > > > > > > > > >> more > > > > > > > > >> > > > > >> vulnerable > > > > > > > > >> > > > > >> > to > > > > > > > > >> > > > > >> > > > > various > > > > > > > > >> > > > > >> > > > > > > > invalid regexes. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > That could work. I have to look at > the > > > > > > differences > > > > > > > > >> > between > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > two > > > > > > > > >> > > > > >> > > > > > > engines to better understand the > > > potential > > > > > > issues. > > > > > > > > >> My > > > > > > > > >> > > > > >> > > understanding is > > > > > > > > >> > > > > >> > > > > > > that would work for all the basic > > > regular > > > > > > > > >> > expressions. The > > > > > > > > >> > > > > >> > > differences > > > > > > > > >> > > > > >> > > > > > > between the two are mainly about > the > > > various > > > > > > > > >> character > > > > > > > > >> > > > > >> classes. I > > > > > > > > >> > > > > >> > > > > > > wonder what other people think > about > > > this. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > Best, > > > > > > > > >> > > > > >> > > > > > > David > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > On Tue, Jul 12, 2022 at 11:28 PM > > > Guozhang > > > > > > Wang < > > > > > > > > >> > > > > >> > wangg...@gmail.com > > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > >> > > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > Thanks David! I think on the high > > > level > > > > > > there are > > > > > > > > >> > two meta > > > > > > > > >> > > > > >> > > points we > > > > > > > > >> > > > > >> > > > > need > > > > > > > > >> > > > > >> > > > > > > > to concretize a bit more: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > 1) the migration path, especially > > > the last > > > > > > step > > > > > > > > >> when > > > > > > > > >> > > > > clients > > > > > > > > >> > > > > >> > > flip the > > > > > > > > >> > > > > >> > > > > > > flag > > > > > > > > >> > > > > >> > > > > > > > to enable the new protocol, in > which > > > we > > > > > > would > > > > > > > > >> have a > > > > > > > > >> > > > > window > > > > > > > > >> > > > > >> > where > > > > > > > > >> > > > > >> > > > > both > > > > > > > > >> > > > > >> > > > > > > new > > > > > > > > >> > > > > >> > > > > > > > protocols / rpcs and old > protocols / > > > rpcs > > > > > > are > > > > > > > > >> used > > > > > > > > >> > by > > > > > > > > >> > > > > >> members > > > > > > > > >> > > > > >> > of > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > same > > > > > > > > >> > > > > >> > > > > > > > group. How the coordinator could > > > "mimic" > > > > > > the old > > > > > > > > >> > behavior > > > > > > > > >> > > > > >> while > > > > > > > > >> > > > > >> > > > > using the > > > > > > > > >> > > > > >> > > > > > > > new protocol is something we > need to > > > > > > present > > > > > > > > >> about. > > > > > > > > >> > > > > >> > > > > > > > 2) the usage of topic ids. So > far as > > > > > > KIP-516 the > > > > > > > > >> > topic ids > > > > > > > > >> > > > > >> are > > > > > > > > >> > > > > >> > > only > > > > > > > > >> > > > > >> > > > > used > > > > > > > > >> > > > > >> > > > > > > as > > > > > > > > >> > > > > >> > > > > > > > part of RPCs and admin client, > but > > > they > > > > > > are not > > > > > > > > >> > exposed > > > > > > > > >> > > > > via > > > > > > > > >> > > > > >> any > > > > > > > > >> > > > > >> > > > > public > > > > > > > > >> > > > > >> > > > > > > APIs > > > > > > > > >> > > > > >> > > > > > > > to consumers yet. I think the > > > question is, > > > > > > first > > > > > > > > >> > should we > > > > > > > > >> > > > > >> let > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > consumer > > > > > > > > >> > > > > >> > > > > > > > client to be maintaining the > names > > > -> ids > > > > > > mapping > > > > > > > > >> > itself > > > > > > > > >> > > > > to > > > > > > > > >> > > > > >> > fully > > > > > > > > >> > > > > >> > > > > > > leverage > > > > > > > > >> > > > > >> > > > > > > > on all the augmented existing > RPCs > > > and the > > > > > > new > > > > > > > > >> RPCs > > > > > > > > >> > with > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > topic > > > > > > > > >> > > > > >> > > > > ids; > > > > > > > > >> > > > > >> > > > > > > and > > > > > > > > >> > > > > >> > > > > > > > secondly, should we ever consider > > > exposing > > > > > > the > > > > > > > > >> > topic ids > > > > > > > > >> > > > > in > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > consumer > > > > > > > > >> > > > > >> > > > > > > > public APIs as well (both > > > > > > subscribe/assign, as > > > > > > > > >> well > > > > > > > > >> > as in > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > rebalance > > > > > > > > >> > > > > >> > > > > > > > listener for cases like topic > > > > > > > > >> > deletion-and-recreation). > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > I'm agreeing with David on all > other > > > minor > > > > > > > > >> questions > > > > > > > > >> > > > > except > > > > > > > > >> > > > > >> for > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > > `subscribe(Pattern)` question: > > > personally > > > > > > I think > > > > > > > > >> > it's not > > > > > > > > >> > > > > >> > > necessary > > > > > > > > >> > > > > >> > > > > to > > > > > > > > >> > > > > >> > > > > > > > deprecate the subscribe API with > > > Pattern, > > > > > > but > > > > > > > > >> > instead we > > > > > > > > >> > > > > >> still > > > > > > > > >> > > > > >> > > use > > > > > > > > >> > > > > >> > > > > > > Pattern > > > > > > > > >> > > > > >> > > > > > > > while just documenting that our > > > > > > subscription may > > > > > > > > >> be > > > > > > > > >> > > > > >> rejected by > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > server. > > > > > > > > >> > > > > >> > > > > > > > Since the incompatible case is a > > > very rare > > > > > > > > >> scenario > > > > > > > > >> > I felt > > > > > > > > >> > > > > >> > using > > > > > > > > >> > > > > >> > > an > > > > > > > > >> > > > > >> > > > > > > > overloaded `String` based > > > subscription may > > > > > > be > > > > > > > > >> more > > > > > > > > >> > > > > >> vulnerable > > > > > > > > >> > > > > >> > to > > > > > > > > >> > > > > >> > > > > various > > > > > > > > >> > > > > >> > > > > > > > invalid regexes. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > Guozhang > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > On Tue, Jul 12, 2022 at 5:23 AM > > > David Jacot > > > > > > > > >> > > > > >> > > > > <dja...@confluent.io.invalid > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Hi Ismael, > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Thanks for your feedback. Let > me > > > answer > > > > > > your > > > > > > > > >> > questions > > > > > > > > >> > > > > >> > inline. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 1. I think it's premature to > > > talk about > > > > > > > > >> target > > > > > > > > >> > > > > versions > > > > > > > > >> > > > > >> for > > > > > > > > >> > > > > >> > > > > > > deprecation > > > > > > > > >> > > > > >> > > > > > > > > and > > > > > > > > >> > > > > >> > > > > > > > > > removal of the existing group > > > protocol. > > > > > > > > >> Unlike > > > > > > > > >> > KRaft, > > > > > > > > >> > > > > >> this > > > > > > > > >> > > > > >> > > > > affects a > > > > > > > > >> > > > > >> > > > > > > core > > > > > > > > >> > > > > >> > > > > > > > > > client protocol and hence > > > > > > deprecation/removal > > > > > > > > >> > will be > > > > > > > > >> > > > > >> > heavily > > > > > > > > >> > > > > >> > > > > > > dependent > > > > > > > > >> > > > > >> > > > > > > > > on > > > > > > > > >> > > > > >> > > > > > > > > > how quickly applications > migrate > > > to > > > > > > the new > > > > > > > > >> > protocol. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > That makes sense. I will > remove it. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 2. The KIP says we intend to > > > release > > > > > > this in > > > > > > > > >> > 4.x, but > > > > > > > > >> > > > > it > > > > > > > > >> > > > > >> > > wasn't > > > > > > > > >> > > > > >> > > > > made > > > > > > > > >> > > > > >> > > > > > > > > clear > > > > > > > > >> > > > > >> > > > > > > > > > why. If we added that as a > way to > > > > > > estimate > > > > > > > > >> when > > > > > > > > >> > we'd > > > > > > > > >> > > > > >> > > deprecate > > > > > > > > >> > > > > >> > > > > and > > > > > > > > >> > > > > >> > > > > > > remove > > > > > > > > >> > > > > >> > > > > > > > > > the group protocol, I also > > > suggest > > > > > > removing > > > > > > > > >> > this part. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Let me explain my reasoning. As > > > > > > explained, I > > > > > > > > >> plan > > > > > > > > >> > to > > > > > > > > >> > > > > >> rewrite > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > group > > > > > > > > >> > > > > >> > > > > > > > > coordinator in Java while we > > > implement > > > > > > the new > > > > > > > > >> > protocol. > > > > > > > > >> > > > > >> This > > > > > > > > >> > > > > >> > > means > > > > > > > > >> > > > > >> > > > > > > > > that the internals will be > slightly > > > > > > different > > > > > > > > >> > (e.g. > > > > > > > > >> > > > > >> threading > > > > > > > > >> > > > > >> > > > > model). > > > > > > > > >> > > > > >> > > > > > > > > Therefore, I wanted to tighten > the > > > > > > switch from > > > > > > > > >> > the old > > > > > > > > >> > > > > >> group > > > > > > > > >> > > > > >> > > > > > > > > coordinator to the new group > > > coordinator > > > > > > to a > > > > > > > > >> > major > > > > > > > > >> > > > > >> release. > > > > > > > > >> > > > > >> > > The > > > > > > > > >> > > > > >> > > > > > > > > alternative would be to use a > flag > > > to do > > > > > > the > > > > > > > > >> > switch > > > > > > > > >> > > > > >> instead > > > > > > > > >> > > > > >> > of > > > > > > > > >> > > > > >> > > > > relying > > > > > > > > >> > > > > >> > > > > > > > > on the software upgrade. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 3. We need to flesh out the > > > details of > > > > > > the > > > > > > > > >> > migration > > > > > > > > >> > > > > >> story. > > > > > > > > >> > > > > >> > > It > > > > > > > > >> > > > > >> > > > > sounds > > > > > > > > >> > > > > >> > > > > > > > > like > > > > > > > > >> > > > > >> > > > > > > > > > we're saying we will support > > > online > > > > > > > > >> migrations. > > > > > > > > >> > Is > > > > > > > > >> > > > > that > > > > > > > > >> > > > > >> > > correct? > > > > > > > > >> > > > > >> > > > > We > > > > > > > > >> > > > > >> > > > > > > > > should > > > > > > > > >> > > > > >> > > > > > > > > > explain this in detail. It > could > > > also > > > > > > be done > > > > > > > > >> > as a > > > > > > > > >> > > > > >> separate > > > > > > > > >> > > > > >> > > KIP, > > > > > > > > >> > > > > >> > > > > if > > > > > > > > >> > > > > >> > > > > > > it's > > > > > > > > >> > > > > >> > > > > > > > > > easier. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Yes, we will support online > > > migrations > > > > > > for the > > > > > > > > >> > group. > > > > > > > > >> > > > > That > > > > > > > > >> > > > > >> > > means > > > > > > > > >> > > > > >> > > > > that > > > > > > > > >> > > > > >> > > > > > > > > a group using the old protocol > > > will be > > > > > > able to > > > > > > > > >> > switch to > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > new > > > > > > > > >> > > > > >> > > > > > > > > protocol. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Let me briefly explain how that > > > will work > > > > > > > > >> though. > > > > > > > > >> > It is > > > > > > > > >> > > > > >> > > basically a > > > > > > > > >> > > > > >> > > > > > > > > four step process: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > 1. The cluster must be > upgraded or > > > > > > rolled to a > > > > > > > > >> > software > > > > > > > > >> > > > > >> > > supporting > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > > > new group coordinator. Both the > > > old and > > > > > > the new > > > > > > > > >> > > > > >> coordinator > > > > > > > > >> > > > > >> > > will > > > > > > > > >> > > > > >> > > > > > > > > support the old protocol and > rely > > > on the > > > > > > same > > > > > > > > >> > persisted > > > > > > > > >> > > > > >> > > metadata so > > > > > > > > >> > > > > >> > > > > > > > > they can work together. This > point > > > is an > > > > > > > > >> offline > > > > > > > > >> > > > > >> migration. > > > > > > > > >> > > > > >> > We > > > > > > > > >> > > > > >> > > > > cannot > > > > > > > > >> > > > > >> > > > > > > > > do this one live because it > would > > > require > > > > > > > > >> > shutting down > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > current > > > > > > > > >> > > > > >> > > > > > > > > coordinator and starting up the > > > new one > > > > > > and > > > > > > > > >> that > > > > > > > > >> > would > > > > > > > > >> > > > > >> cause > > > > > > > > >> > > > > >> > > > > > > > > unavailabilities. > > > > > > > > >> > > > > >> > > > > > > > > 2. The cluster's metadata > > > version/IBP > > > > > > must be > > > > > > > > >> > upgraded > > > > > > > > >> > > > > to > > > > > > > > >> > > > > >> X > > > > > > > > >> > > > > >> > in > > > > > > > > >> > > > > >> > > > > order > > > > > > > > >> > > > > >> > > > > > > > > to enable the new protocol. > This > > > cannot > > > > > > be done > > > > > > > > >> > before > > > > > > > > >> > > > > 1) > > > > > > > > >> > > > > >> is > > > > > > > > >> > > > > >> > > > > > > > > terminated because the old > > > coordinator > > > > > > doesn't > > > > > > > > >> > support > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > new > > > > > > > > >> > > > > >> > > > > > > > > protocol. > > > > > > > > >> > > > > >> > > > > > > > > 3. The consumers must be > upgraded > > > to a > > > > > > version > > > > > > > > >> > > > > supporting > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > online > > > > > > > > >> > > > > >> > > > > > > > > migration (must have KIP-792). > If > > > the > > > > > > consumer > > > > > > > > >> is > > > > > > > > >> > > > > already > > > > > > > > >> > > > > >> > > there. > > > > > > > > >> > > > > >> > > > > > > > > Nothing must be done at this > point. > > > > > > > > >> > > > > >> > > > > > > > > 4. The consumers must be rolled > > > with the > > > > > > > > >> feature > > > > > > > > >> > flag > > > > > > > > >> > > > > >> turned > > > > > > > > >> > > > > >> > > on. > > > > > > > > >> > > > > >> > > > > The > > > > > > > > >> > > > > >> > > > > > > > > consumer group is automatically > > > > > > converted when > > > > > > > > >> > the first > > > > > > > > >> > > > > >> > > consumer > > > > > > > > >> > > > > >> > > > > > > > > using the new protocol joins > the > > > group. > > > > > > While > > > > > > > > >> the > > > > > > > > >> > > > > members > > > > > > > > >> > > > > >> > > using the > > > > > > > > >> > > > > >> > > > > > > > > old protocol are being > upgraded, > > > the old > > > > > > > > >> protocol > > > > > > > > >> > is > > > > > > > > >> > > > > >> proxied > > > > > > > > >> > > > > >> > > into > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > > > new one. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Let me clarify all of this in > the > > > KIP. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 4. I am happy that we are > > > pushing the > > > > > > pattern > > > > > > > > >> > > > > >> subscriptions > > > > > > > > >> > > > > >> > > to > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > > > server, > > > > > > > > >> > > > > >> > > > > > > > > > but it seems like there > could be > > > some > > > > > > tricky > > > > > > > > >> > > > > >> compatibility > > > > > > > > >> > > > > >> > > > > issues. > > > > > > > > >> > > > > >> > > > > > > Will > > > > > > > > >> > > > > >> > > > > > > > > we > > > > > > > > >> > > > > >> > > > > > > > > > have a mechanism for users to > > > detect > > > > > > that > > > > > > > > >> they > > > > > > > > >> > need to > > > > > > > > >> > > > > >> > update > > > > > > > > >> > > > > >> > > > > their > > > > > > > > >> > > > > >> > > > > > > regex > > > > > > > > >> > > > > >> > > > > > > > > > before switching to the new > > > protocol? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > I think that I am a bit more > > > optimistic > > > > > > than > > > > > > > > >> you > > > > > > > > >> > on this > > > > > > > > >> > > > > >> > > point. I > > > > > > > > >> > > > > >> > > > > > > > > believe that the majority of > the > > > cases > > > > > > are > > > > > > > > >> simple > > > > > > > > >> > > > > regexes > > > > > > > > >> > > > > >> > which > > > > > > > > >> > > > > >> > > > > should > > > > > > > > >> > > > > >> > > > > > > > > work with the new engine. The > > > > > > coordinator will > > > > > > > > >> > verify > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > regex > > > > > > > > >> > > > > >> > > > > anyway > > > > > > > > >> > > > > >> > > > > > > > > and reject the consumer if the > > > regex is > > > > > > not > > > > > > > > >> valid. > > > > > > > > >> > > > > Coming > > > > > > > > >> > > > > >> > back > > > > > > > > >> > > > > >> > > to > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > > > migration path, in the worst > case, > > > the > > > > > > first > > > > > > > > >> > upgraded > > > > > > > > >> > > > > >> > consumer > > > > > > > > >> > > > > >> > > > > joining > > > > > > > > >> > > > > >> > > > > > > > > the group will be rejected. > This > > > should > > > > > > be used > > > > > > > > >> > as the > > > > > > > > >> > > > > >> last > > > > > > > > >> > > > > >> > > > > defence, I > > > > > > > > >> > > > > >> > > > > > > > > would say. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > One way for customers to > validate > > > their > > > > > > regex > > > > > > > > >> > before > > > > > > > > >> > > > > >> > upgrading > > > > > > > > >> > > > > >> > > > > their > > > > > > > > >> > > > > >> > > > > > > > > prod would be to test them with > > > another > > > > > > group. > > > > > > > > >> For > > > > > > > > >> > > > > >> instance, > > > > > > > > >> > > > > >> > > that > > > > > > > > >> > > > > >> > > > > > > > > could be done in a pre-prod > > > environment. > > > > > > > > >> Another > > > > > > > > >> > way > > > > > > > > >> > > > > >> would be > > > > > > > > >> > > > > >> > > to > > > > > > > > >> > > > > >> > > > > > > > > extend the consumer-group tool > to > > > > > > provide a > > > > > > > > >> regex > > > > > > > > >> > > > > >> validation > > > > > > > > >> > > > > >> > > > > > > > > mechanism. Would this be > enough in > > > your > > > > > > > > >> opinion? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 5. Related to the last > question, > > > will > > > > > > the > > > > > > > > >> Java > > > > > > > > >> > client > > > > > > > > >> > > > > >> allow > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > users to > > > > > > > > >> > > > > >> > > > > > > > > > stick with the current regex > > > engine for > > > > > > > > >> > compatibility > > > > > > > > >> > > > > >> > > reasons? > > > > > > > > >> > > > > >> > > > > For > > > > > > > > >> > > > > >> > > > > > > > > example, > > > > > > > > >> > > > > >> > > > > > > > > > it may be handy to keep using > > > client > > > > > > based > > > > > > > > >> > regex at > > > > > > > > >> > > > > >> first > > > > > > > > >> > > > > >> > to > > > > > > > > >> > > > > >> > > keep > > > > > > > > >> > > > > >> > > > > > > > > > migrations simple and then > > > migrate to > > > > > > server > > > > > > > > >> > based > > > > > > > > >> > > > > >> regexes > > > > > > > > >> > > > > >> > > as a > > > > > > > > >> > > > > >> > > > > > > second > > > > > > > > >> > > > > >> > > > > > > > > step. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > I understand your point but I > am > > > > > > concerned that > > > > > > > > >> > this > > > > > > > > >> > > > > would > > > > > > > > >> > > > > >> > > allow > > > > > > > > >> > > > > >> > > > > users > > > > > > > > >> > > > > >> > > > > > > > > to actually stay in this mode. > That > > > > > > would go > > > > > > > > >> > against our > > > > > > > > >> > > > > >> goal > > > > > > > > >> > > > > >> > > of > > > > > > > > >> > > > > >> > > > > > > > > simplifying the client because > we > > > would > > > > > > have to > > > > > > > > >> > continue > > > > > > > > >> > > > > >> > > monitoring > > > > > > > > >> > > > > >> > > > > > > > > the metadata on the client > side. I > > > would > > > > > > rather > > > > > > > > >> > not do > > > > > > > > >> > > > > >> this. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 6. When we say that the group > > > > > > coordinator > > > > > > > > >> will > > > > > > > > >> > be > > > > > > > > >> > > > > >> > > responsible for > > > > > > > > >> > > > > >> > > > > > > storing > > > > > > > > >> > > > > >> > > > > > > > > > the configurations and that > the > > > > > > > > >> configurations > > > > > > > > >> > will be > > > > > > > > >> > > > > >> > > deleted > > > > > > > > >> > > > > >> > > > > when > > > > > > > > >> > > > > >> > > > > > > the > > > > > > > > >> > > > > >> > > > > > > > > > group is deleted. Will a > > > transition to > > > > > > DEAD > > > > > > > > >> > trigger > > > > > > > > >> > > > > >> > deletion > > > > > > > > >> > > > > >> > > of > > > > > > > > >> > > > > >> > > > > > > > > > configurations? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > That's right. The > configurations > > > will be > > > > > > > > >> deleted > > > > > > > > >> > when > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > group is > > > > > > > > >> > > > > >> > > > > > > > > deleted. They go together. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 7. Will the choice to store > the > > > > > > configs in > > > > > > > > >> the > > > > > > > > >> > group > > > > > > > > >> > > > > >> > > coordinator > > > > > > > > >> > > > > >> > > > > > > make it > > > > > > > > >> > > > > >> > > > > > > > > > harder to list all cluster > > > configs and > > > > > > their > > > > > > > > >> > values? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > I don't think so. The group > > > > > > configurations are > > > > > > > > >> > overrides > > > > > > > > >> > > > > >> of > > > > > > > > >> > > > > >> > > cluster > > > > > > > > >> > > > > >> > > > > > > > > configs. If you want to know > all > > > the > > > > > > overrides > > > > > > > > >> > though, > > > > > > > > >> > > > > you > > > > > > > > >> > > > > >> > > would > > > > > > > > >> > > > > >> > > > > have > > > > > > > > >> > > > > >> > > > > > > > > to ask all the group > coordinators. > > > You > > > > > > cannot > > > > > > > > >> > rely on > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > metadata > > > > > > > > >> > > > > >> > > > > log > > > > > > > > >> > > > > >> > > > > > > > > for instance. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 8. How would someone > configure a > > > group > > > > > > before > > > > > > > > >> > starting > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > consumers? > > > > > > > > >> > > > > >> > > > > > > > > Have > > > > > > > > >> > > > > >> > > > > > > > > > we considered allowing the > > > explicit > > > > > > creation > > > > > > > > >> of > > > > > > > > >> > > > > groups? > > > > > > > > >> > > > > >> > > > > > > Alternatively, > > > > > > > > >> > > > > >> > > > > > > > > the > > > > > > > > >> > > > > >> > > > > > > > > > configs could be decoupled > from > > > the > > > > > > group > > > > > > > > >> > lifecycle. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Yes. The group will be > > > automatically > > > > > > created in > > > > > > > > >> > this > > > > > > > > >> > > > > case. > > > > > > > > >> > > > > >> > > However, > > > > > > > > >> > > > > >> > > > > > > > > the configs will be lost after > the > > > > > > retention > > > > > > > > >> > period of > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > group > > > > > > > > >> > > > > >> > > > > > > > > passes. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 9. Will the > Consumer.subscribe > > > method > > > > > > for the > > > > > > > > >> > Java > > > > > > > > >> > > > > >> client > > > > > > > > >> > > > > >> > > still > > > > > > > > >> > > > > >> > > > > take > > > > > > > > >> > > > > >> > > > > > > a > > > > > > > > >> > > > > >> > > > > > > > > > `java.util.regex.Pattern` of > do > > > we > > > > > > have to > > > > > > > > >> > introduce > > > > > > > > >> > > > > an > > > > > > > > >> > > > > >> > > overload? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > That's a very group question. I > > > forgot > > > > > > about > > > > > > > > >> that > > > > > > > > >> > one. > > > > > > > > >> > > > > As > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > > > > > `java.util.regex.Pattern` is > not > > > fully > > > > > > > > >> compatible > > > > > > > > >> > with > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > engine > > > > > > > > >> > > > > >> > > > > that > > > > > > > > >> > > > > >> > > > > > > > > we plan to use, it might be > better > > > to > > > > > > deprecate > > > > > > > > >> > it and > > > > > > > > >> > > > > >> use an > > > > > > > > >> > > > > >> > > > > overload > > > > > > > > >> > > > > >> > > > > > > > > which takes a string. We would > > > rely on > > > > > > the > > > > > > > > >> server > > > > > > > > >> > side > > > > > > > > >> > > > > >> > > validation. > > > > > > > > >> > > > > >> > > > > > > > > During the migration, I think > that > > > we > > > > > > could > > > > > > > > >> still > > > > > > > > >> > try to > > > > > > > > >> > > > > >> > > toString > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > > > regex and use it. That should > > > work, I > > > > > > think, in > > > > > > > > >> > the > > > > > > > > >> > > > > >> majority > > > > > > > > >> > > > > >> > > of the > > > > > > > > >> > > > > >> > > > > > > > > cases. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 10. I agree with Justine > that we > > > > > > should be > > > > > > > > >> > clearer > > > > > > > > >> > > > > about > > > > > > > > >> > > > > >> > the > > > > > > > > >> > > > > >> > > > > reason > > > > > > > > >> > > > > >> > > > > > > to > > > > > > > > >> > > > > >> > > > > > > > > > switch to > IBP/metadata.version > > > from the > > > > > > > > >> feature > > > > > > > > >> > flag. > > > > > > > > >> > > > > >> Maybe > > > > > > > > >> > > > > >> > > we > > > > > > > > >> > > > > >> > > > > mean > > > > > > > > >> > > > > >> > > > > > > that > > > > > > > > >> > > > > >> > > > > > > > > we > > > > > > > > >> > > > > >> > > > > > > > > > can switch the default for > the > > > feature > > > > > > flag > > > > > > > > >> to > > > > > > > > >> > true > > > > > > > > >> > > > > >> based > > > > > > > > >> > > > > >> > on > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > > > > metadata.version once we > want to > > > make > > > > > > it the > > > > > > > > >> > default. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > My plan was to use that feature > > > flag > > > > > > mainly > > > > > > > > >> > during the > > > > > > > > >> > > > > >> > > development > > > > > > > > >> > > > > >> > > > > > > > > phase. I should not have > mentioned > > > it, I > > > > > > think, > > > > > > > > >> > because > > > > > > > > >> > > > > we > > > > > > > > >> > > > > >> > > could > > > > > > > > >> > > > > >> > > > > use > > > > > > > > >> > > > > >> > > > > > > > > an internal config for it. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 11. Some of the protocol APIs > > > don't > > > > > > mention > > > > > > > > >> the > > > > > > > > >> > > > > required > > > > > > > > >> > > > > >> > > ACLs, it > > > > > > > > >> > > > > >> > > > > > > would > > > > > > > > >> > > > > >> > > > > > > > > be > > > > > > > > >> > > > > >> > > > > > > > > > good to add that for > consistency. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Noted. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 12. It is a bit odd that > > > > > > > > >> ConsumerGroupHeartbeat > > > > > > > > >> > > > > requires > > > > > > > > >> > > > > >> > > "Read > > > > > > > > >> > > > > >> > > > > Group" > > > > > > > > >> > > > > >> > > > > > > > > even > > > > > > > > >> > > > > >> > > > > > > > > > though it seems to do more > than > > > > > > reading. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > I agree. This is how the > current > > > > > > protocol works > > > > > > > > >> > though. > > > > > > > > >> > > > > We > > > > > > > > >> > > > > >> > only > > > > > > > > >> > > > > >> > > > > > > > > require "Read Group" to join a > > > group. We > > > > > > could > > > > > > > > >> > consider > > > > > > > > >> > > > > >> > > changing > > > > > > > > >> > > > > >> > > > > this > > > > > > > > >> > > > > >> > > > > > > > > but I am not sure that it is > worth > > > it. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 13. How is topic recreation > > > handled by > > > > > > the > > > > > > > > >> > consumer > > > > > > > > >> > > > > with > > > > > > > > >> > > > > >> > the > > > > > > > > >> > > > > >> > > new > > > > > > > > >> > > > > >> > > > > > > group > > > > > > > > >> > > > > >> > > > > > > > > > protocol? It would be good to > > > have a > > > > > > section > > > > > > > > >> on > > > > > > > > >> > this. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Noted. From a protocol > > > perspective, the > > > > > > new > > > > > > > > >> topic > > > > > > > > >> > will > > > > > > > > >> > > > > >> have a > > > > > > > > >> > > > > >> > > new > > > > > > > > >> > > > > >> > > > > > > > > topic id so it will treat it > like a > > > > > > topic with > > > > > > > > >> a > > > > > > > > >> > > > > different > > > > > > > > >> > > > > >> > > name. > > > > > > > > >> > > > > >> > > > > The > > > > > > > > >> > > > > >> > > > > > > > > only issue is that the > fetch/commit > > > > > > offsets > > > > > > > > >> APIs > > > > > > > > >> > do not > > > > > > > > >> > > > > >> > support > > > > > > > > >> > > > > >> > > > > topic > > > > > > > > >> > > > > >> > > > > > > > > IDs so the consumer would > reuse the > > > > > > offsets > > > > > > > > >> based > > > > > > > > >> > on the > > > > > > > > >> > > > > >> > same. > > > > > > > > >> > > > > >> > > I > > > > > > > > >> > > > > >> > > > > think > > > > > > > > >> > > > > >> > > > > > > > > that we should update those > APIs > > > as well > > > > > > in > > > > > > > > >> order > > > > > > > > >> > to be > > > > > > > > >> > > > > >> > > consistent > > > > > > > > >> > > > > >> > > > > end > > > > > > > > >> > > > > >> > > > > > > > > to end. That would strengthen > the > > > > > > semantics of > > > > > > > > >> the > > > > > > > > >> > > > > >> consumer. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 14. The KIP mentions we will > > > write the > > > > > > new > > > > > > > > >> > coordinator > > > > > > > > >> > > > > >> in > > > > > > > > >> > > > > >> > > Java. > > > > > > > > >> > > > > >> > > > > Even > > > > > > > > >> > > > > >> > > > > > > > > though > > > > > > > > >> > > > > >> > > > > > > > > > this is an implementation > > > detail, do > > > > > > we plan > > > > > > > > >> to > > > > > > > > >> > have a > > > > > > > > >> > > > > >> new > > > > > > > > >> > > > > >> > > gradle > > > > > > > > >> > > > > >> > > > > > > module > > > > > > > > >> > > > > >> > > > > > > > > > for it? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Yes. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 15. Do we have a scalability > > > goal when > > > > > > it > > > > > > > > >> comes > > > > > > > > >> > to how > > > > > > > > >> > > > > >> many > > > > > > > > >> > > > > >> > > > > members > > > > > > > > >> > > > > >> > > > > > > the > > > > > > > > >> > > > > >> > > > > > > > > new > > > > > > > > >> > > > > >> > > > > > > > > > group protocol can support? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > We don't have numbers at the > > > moment. The > > > > > > > > >> protocol > > > > > > > > >> > should > > > > > > > > >> > > > > >> > > support > > > > > > > > >> > > > > >> > > > > 1000s > > > > > > > > >> > > > > >> > > > > > > > > of members per group. We will > > > measure > > > > > > this when > > > > > > > > >> > we have > > > > > > > > >> > > > > a > > > > > > > > >> > > > > >> > first > > > > > > > > >> > > > > >> > > > > > > > > implementation. Note that we > might > > > have > > > > > > other > > > > > > > > >> > > > > bottlenecks > > > > > > > > >> > > > > >> > down > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > > > road (e.g. offset commits). > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > 16. Did we consider having > > > > > > SubscribedTopidIds > > > > > > > > >> > instead > > > > > > > > >> > > > > of > > > > > > > > >> > > > > >> > > > > > > > > > SubscribedTopicNames in > > > > > > > > >> > ConsumerGroupHeartbeatRequest? > > > > > > > > >> > > > > >> Is > > > > > > > > >> > > > > >> > the > > > > > > > > >> > > > > >> > > > > idea > > > > > > > > >> > > > > >> > > > > > > that > > > > > > > > >> > > > > >> > > > > > > > > > since we have to resolve the > > > regex on > > > > > > the > > > > > > > > >> > server, we > > > > > > > > >> > > > > >> can do > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > same > > > > > > > > >> > > > > >> > > > > > > for > > > > > > > > >> > > > > >> > > > > > > > > > the topic name? The > difference > > > is that > > > > > > > > >> sending > > > > > > > > >> > the > > > > > > > > >> > > > > >> regex is > > > > > > > > >> > > > > >> > > more > > > > > > > > >> > > > > >> > > > > > > > > efficient > > > > > > > > >> > > > > >> > > > > > > > > > whereas sending the topic > names > > > is less > > > > > > > > >> > efficient. > > > > > > > > >> > > > > >> > > Furthermore, > > > > > > > > >> > > > > >> > > > > > > delete > > > > > > > > >> > > > > >> > > > > > > > > and > > > > > > > > >> > > > > >> > > > > > > > > > recreation is easier to > handle > > > if we > > > > > > have > > > > > > > > >> topic > > > > > > > > >> > ids. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > The idea was to consolidate the > > > metadata > > > > > > lookup > > > > > > > > >> > on the > > > > > > > > >> > > > > >> server > > > > > > > > >> > > > > >> > > for > > > > > > > > >> > > > > >> > > > > both > > > > > > > > >> > > > > >> > > > > > > > > paths but I do agree with your > > > point. As > > > > > > a > > > > > > > > >> second > > > > > > > > >> > > > > though, > > > > > > > > >> > > > > >> > using > > > > > > > > >> > > > > >> > > > > topic > > > > > > > > >> > > > > >> > > > > > > > > ids may be better here for the > > > delete and > > > > > > > > >> > recreation > > > > > > > > >> > > > > case. > > > > > > > > >> > > > > >> > > Also, I > > > > > > > > >> > > > > >> > > > > > > > > suppose that we may allow > users to > > > > > > subscribe > > > > > > > > >> with > > > > > > > > >> > topic > > > > > > > > >> > > > > >> ids > > > > > > > > >> > > > > >> > in > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > > > future because that is the only > > > way to be > > > > > > > > >> really > > > > > > > > >> > robust > > > > > > > > >> > > > > to > > > > > > > > >> > > > > >> > > topic > > > > > > > > >> > > > > >> > > > > > > > > re-creation. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > Best, > > > > > > > > >> > > > > >> > > > > > > > > David > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > On Tue, Jul 12, 2022 at 1:38 PM > > > David > > > > > > Jacot < > > > > > > > > >> > > > > >> > > dja...@confluent.io> > > > > > > > > >> > > > > >> > > > > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > Hi Justine, > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > Thanks for your comments. > Please > > > find > > > > > > my > > > > > > > > >> answers > > > > > > > > >> > > > > below. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > - Yes, the new protocol > relies on > > > > > > topic IDs > > > > > > > > >> > with the > > > > > > > > >> > > > > >> > > exception > > > > > > > > >> > > > > >> > > > > of the > > > > > > > > >> > > > > >> > > > > > > > > > topic names based in the > > > > > > > > >> > > > > ConsumerGroupHeartbeatRequest. > > > > > > > > >> > > > > >> I > > > > > > > > >> > > > > >> > am > > > > > > > > >> > > > > >> > > not > > > > > > > > >> > > > > >> > > > > sure > > > > > > > > >> > > > > >> > > > > > > > > > if using topic names is the > > > right call > > > > > > here. > > > > > > > > >> I > > > > > > > > >> > need to > > > > > > > > >> > > > > >> > think > > > > > > > > >> > > > > >> > > > > about it > > > > > > > > >> > > > > >> > > > > > > > > > a little more. Obviously, the > > > KIP does > > > > > > not > > > > > > > > >> > change the > > > > > > > > >> > > > > >> > > > > fetch/commit > > > > > > > > >> > > > > >> > > > > > > > > > offsets RPCs to use topic > IDs. > > > This > > > > > > may be > > > > > > > > >> > something > > > > > > > > >> > > > > >> that > > > > > > > > >> > > > > >> > we > > > > > > > > >> > > > > >> > > > > should > > > > > > > > >> > > > > >> > > > > > > > > > include though as it would > give > > > better > > > > > > > > >> overall > > > > > > > > >> > > > > >> guarantee in > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > > > > producer. > > > > > > > > >> > > > > >> > > > > > > > > > - You're right. I think that > I > > > should > > > > > > not > > > > > > > > >> have > > > > > > > > >> > > > > mentioned > > > > > > > > >> > > > > >> > this > > > > > > > > >> > > > > >> > > > > flag at > > > > > > > > >> > > > > >> > > > > > > > > > all. I will remove it. We can > > > use an > > > > > > internal > > > > > > > > >> > > > > >> configuration > > > > > > > > >> > > > > >> > > while > > > > > > > > >> > > > > >> > > > > > > > > > developing the feature. > > > > > > > > >> > > > > >> > > > > > > > > > - Both cluster types will be > > > > > > supported. The > > > > > > > > >> > change is > > > > > > > > >> > > > > >> > > > > orthogonal. The > > > > > > > > >> > > > > >> > > > > > > > > > only requirement is that the > > > cluster > > > > > > uses > > > > > > > > >> topic > > > > > > > > >> > IDs. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > Best, > > > > > > > > >> > > > > >> > > > > > > > > > David > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > On Mon, Jul 11, 2022 at 9:53 > PM > > > > > > Guozhang > > > > > > > > >> Wang < > > > > > > > > >> > > > > >> > > > > wangg...@gmail.com> > > > > > > > > >> > > > > >> > > > > > > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Hi Ismael, > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Thanks for the feedback. > Here > > > are > > > > > > some > > > > > > > > >> replies > > > > > > > > >> > > > > inlined > > > > > > > > >> > > > > >> > > below: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > On Sat, Jul 9, 2022 at > 2:53 AM > > > > > > Ismael Juma > > > > > > > > >> < > > > > > > > > >> > > > > >> > > ism...@juma.me.uk> > > > > > > > > >> > > > > >> > > > > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > Thanks for the KIP. This > has > > > the > > > > > > > > >> potential > > > > > > > > >> > to be a > > > > > > > > >> > > > > >> > great > > > > > > > > >> > > > > >> > > > > > > > > improvement. A few > > > > > > > > >> > > > > >> > > > > > > > > > > > initial > questions/comments: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 1. I think it's > premature to > > > talk > > > > > > about > > > > > > > > >> > target > > > > > > > > >> > > > > >> versions > > > > > > > > >> > > > > >> > > for > > > > > > > > >> > > > > >> > > > > > > > > deprecation and > > > > > > > > >> > > > > >> > > > > > > > > > > > removal of the existing > group > > > > > > protocol. > > > > > > > > >> > Unlike > > > > > > > > >> > > > > >> KRaft, > > > > > > > > >> > > > > >> > > this > > > > > > > > >> > > > > >> > > > > > > affects a > > > > > > > > >> > > > > >> > > > > > > > > core > > > > > > > > >> > > > > >> > > > > > > > > > > > client protocol and hence > > > > > > > > >> > deprecation/removal will > > > > > > > > >> > > > > >> be > > > > > > > > >> > > > > >> > > heavily > > > > > > > > >> > > > > >> > > > > > > > > dependent on > > > > > > > > >> > > > > >> > > > > > > > > > > > how quickly applications > > > migrate > > > > > > to the > > > > > > > > >> new > > > > > > > > >> > > > > >> protocol. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Yeah I agree with you. I > think > > > we can > > > > > > > > >> remove > > > > > > > > >> > the > > > > > > > > >> > > > > >> proposed > > > > > > > > >> > > > > >> > > > > timeline > > > > > > > > >> > > > > >> > > > > > > in > > > > > > > > >> > > > > >> > > > > > > > > the > > > > > > > > >> > > > > >> > > > > > > > > > > `Compatibility, > Deprecation, > > > and > > > > > > Migration > > > > > > > > >> > Plan` and > > > > > > > > >> > > > > >> > > instead > > > > > > > > >> > > > > >> > > > > just > > > > > > > > >> > > > > >> > > > > > > state > > > > > > > > >> > > > > >> > > > > > > > > > > that we will decide in the > > > future > > > > > > about > > > > > > > > >> when > > > > > > > > >> > we > > > > > > > > >> > > > > would > > > > > > > > >> > > > > >> > > > > deprecate old > > > > > > > > >> > > > > >> > > > > > > > > > > protocol and behaviors. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 2. The KIP says we > intend to > > > > > > release this > > > > > > > > >> > in 4.x, > > > > > > > > >> > > > > >> but > > > > > > > > >> > > > > >> > it > > > > > > > > >> > > > > >> > > > > wasn't > > > > > > > > >> > > > > >> > > > > > > made > > > > > > > > >> > > > > >> > > > > > > > > clear > > > > > > > > >> > > > > >> > > > > > > > > > > > why. If we added that as > a > > > way to > > > > > > > > >> estimate > > > > > > > > >> > when > > > > > > > > >> > > > > we'd > > > > > > > > >> > > > > >> > > > > deprecate > > > > > > > > >> > > > > >> > > > > > > and > > > > > > > > >> > > > > >> > > > > > > > > remove > > > > > > > > >> > > > > >> > > > > > > > > > > > the group protocol, I > also > > > suggest > > > > > > > > >> removing > > > > > > > > >> > this > > > > > > > > >> > > > > >> part. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > I think that's not > specifically > > > > > > related to > > > > > > > > >> the > > > > > > > > >> > > > > >> > > > > deprecation/removal > > > > > > > > >> > > > > >> > > > > > > > > timeline > > > > > > > > >> > > > > >> > > > > > > > > > > plan, but it's more for > client > > > > > > upgrades. > > > > > > > > >> I.e. > > > > > > > > >> > the > > > > > > > > >> > > > > >> > > broker-side > > > > > > > > >> > > > > >> > > > > > > > > > > implementation may be done > > > first, > > > > > > and then > > > > > > > > >> the > > > > > > > > >> > > > > client > > > > > > > > >> > > > > >> > side, > > > > > > > > >> > > > > >> > > > > and we > > > > > > > > >> > > > > >> > > > > > > > > would > > > > > > > > >> > > > > >> > > > > > > > > > > only mark it as "released" > by > > > the > > > > > > time > > > > > > > > >> clients > > > > > > > > >> > > > > >> > > implementations > > > > > > > > >> > > > > >> > > > > are > > > > > > > > >> > > > > >> > > > > > > > > done. At > > > > > > > > >> > > > > >> > > > > > > > > > > that time, to enable the > > > feature the > > > > > > > > >> clients > > > > > > > > >> > need to > > > > > > > > >> > > > > >> > first > > > > > > > > >> > > > > >> > > > > swap-in > > > > > > > > >> > > > > >> > > > > > > the > > > > > > > > >> > > > > >> > > > > > > > > > > bytecode with a rolling > bounce > > > and > > > > > > then set > > > > > > > > >> > the flag > > > > > > > > >> > > > > >> > with a > > > > > > > > >> > > > > >> > > > > second > > > > > > > > >> > > > > >> > > > > > > > > rolling > > > > > > > > >> > > > > >> > > > > > > > > > > bounce, and hence we feel > it's > > > > > > better to be > > > > > > > > >> > released > > > > > > > > >> > > > > >> in a > > > > > > > > >> > > > > >> > > major > > > > > > > > >> > > > > >> > > > > > > > > version. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 3. We need to flesh out > the > > > > > > details of > > > > > > > > >> the > > > > > > > > >> > > > > migration > > > > > > > > >> > > > > >> > > story. > > > > > > > > >> > > > > >> > > > > It > > > > > > > > >> > > > > >> > > > > > > > > sounds like > > > > > > > > >> > > > > >> > > > > > > > > > > > we're saying we will > support > > > online > > > > > > > > >> > migrations. Is > > > > > > > > >> > > > > >> that > > > > > > > > >> > > > > >> > > > > correct? > > > > > > > > >> > > > > >> > > > > > > We > > > > > > > > >> > > > > >> > > > > > > > > should > > > > > > > > >> > > > > >> > > > > > > > > > > > explain this in detail. > It > > > could > > > > > > also be > > > > > > > > >> > done as a > > > > > > > > >> > > > > >> > > separate > > > > > > > > >> > > > > >> > > > > KIP, > > > > > > > > >> > > > > >> > > > > > > if > > > > > > > > >> > > > > >> > > > > > > > > it's > > > > > > > > >> > > > > >> > > > > > > > > > > > easier. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Yes I think that's the > part we > > > can > > > > > > be more > > > > > > > > >> > concrete > > > > > > > > >> > > > > >> about > > > > > > > > >> > > > > >> > > for > > > > > > > > >> > > > > >> > > > > sure > > > > > > > > >> > > > > >> > > > > > > (and > > > > > > > > >> > > > > >> > > > > > > > > > > this is related to your > > > question 2) > > > > > > above). > > > > > > > > >> > We will > > > > > > > > >> > > > > >> work > > > > > > > > >> > > > > >> > on > > > > > > > > >> > > > > >> > > > > making > > > > > > > > >> > > > > >> > > > > > > it > > > > > > > > >> > > > > >> > > > > > > > > more > > > > > > > > >> > > > > >> > > > > > > > > > > explicit in parallel as we > > > solicit > > > > > > more > > > > > > > > >> > feedback. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 4. I am happy that we are > > > pushing > > > > > > the > > > > > > > > >> > pattern > > > > > > > > >> > > > > >> > > subscriptions > > > > > > > > >> > > > > >> > > > > to > > > > > > > > >> > > > > >> > > > > > > the > > > > > > > > >> > > > > >> > > > > > > > > server, > > > > > > > > >> > > > > >> > > > > > > > > > > > but it seems like there > > > could be > > > > > > some > > > > > > > > >> tricky > > > > > > > > >> > > > > >> > > compatibility > > > > > > > > >> > > > > >> > > > > > > issues. > > > > > > > > >> > > > > >> > > > > > > > > Will we > > > > > > > > >> > > > > >> > > > > > > > > > > > have a mechanism for > users to > > > > > > detect that > > > > > > > > >> > they > > > > > > > > >> > > > > need > > > > > > > > >> > > > > >> to > > > > > > > > >> > > > > >> > > update > > > > > > > > >> > > > > >> > > > > > > their > > > > > > > > >> > > > > >> > > > > > > > > regex > > > > > > > > >> > > > > >> > > > > > > > > > > > before switching to the > new > > > > > > protocol? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Yes I think we need some > > > tooling for > > > > > > > > >> non-java > > > > > > > > >> > client > > > > > > > > >> > > > > >> > users > > > > > > > > >> > > > > >> > > to > > > > > > > > >> > > > > >> > > > > sort > > > > > > > > >> > > > > >> > > > > > > of > > > > > > > > >> > > > > >> > > > > > > > > > > "dry-run" the client before > > > > > > switching to > > > > > > > > >> the > > > > > > > > >> > new > > > > > > > > >> > > > > >> > protocol. > > > > > > > > >> > > > > >> > > I > > > > > > > > >> > > > > >> > > > > do not > > > > > > > > >> > > > > >> > > > > > > > > have a > > > > > > > > >> > > > > >> > > > > > > > > > > specific idea on top of my > head > > > > > > though, > > > > > > > > >> maybe > > > > > > > > >> > others > > > > > > > > >> > > > > >> like > > > > > > > > >> > > > > >> > > @Matt > > > > > > > > >> > > > > >> > > > > > > > > Howlett can > > > > > > > > >> > > > > >> > > > > > > > > > > chime-in here? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 5. Related to the last > > > question, > > > > > > will the > > > > > > > > >> > Java > > > > > > > > >> > > > > >> client > > > > > > > > >> > > > > >> > > allow > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > > > users to > > > > > > > > >> > > > > >> > > > > > > > > > > > stick with the current > regex > > > > > > engine for > > > > > > > > >> > > > > >> compatibility > > > > > > > > >> > > > > >> > > > > reasons? > > > > > > > > >> > > > > >> > > > > > > For > > > > > > > > >> > > > > >> > > > > > > > > example, > > > > > > > > >> > > > > >> > > > > > > > > > > > it may be handy to keep > using > > > > > > client > > > > > > > > >> based > > > > > > > > >> > regex > > > > > > > > >> > > > > at > > > > > > > > >> > > > > >> > > first to > > > > > > > > >> > > > > >> > > > > keep > > > > > > > > >> > > > > >> > > > > > > > > > > > migrations simple and > then > > > migrate > > > > > > to > > > > > > > > >> > server based > > > > > > > > >> > > > > >> > > regexes > > > > > > > > >> > > > > >> > > > > as a > > > > > > > > >> > > > > >> > > > > > > > > second > > > > > > > > >> > > > > >> > > > > > > > > > > > step. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Honestly I have not thought > > > about > > > > > > that for > > > > > > > > >> > java > > > > > > > > >> > > > > >> clients, > > > > > > > > >> > > > > >> > > and > > > > > > > > >> > > > > >> > > > > we can > > > > > > > > >> > > > > >> > > > > > > > > discuss > > > > > > > > >> > > > > >> > > > > > > > > > > that. What kind of > > > compatibility > > > > > > issues do > > > > > > > > >> > you have > > > > > > > > >> > > > > in > > > > > > > > >> > > > > >> > > mind? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 6. When we say that the > group > > > > > > coordinator > > > > > > > > >> > will be > > > > > > > > >> > > > > >> > > > > responsible for > > > > > > > > >> > > > > >> > > > > > > > > storing > > > > > > > > >> > > > > >> > > > > > > > > > > > the configurations and > that > > > the > > > > > > > > >> > configurations > > > > > > > > >> > > > > will > > > > > > > > >> > > > > >> be > > > > > > > > >> > > > > >> > > > > deleted > > > > > > > > >> > > > > >> > > > > > > when > > > > > > > > >> > > > > >> > > > > > > > > the > > > > > > > > >> > > > > >> > > > > > > > > > > > group is deleted. Will a > > > > > > transition to > > > > > > > > >> DEAD > > > > > > > > >> > > > > trigger > > > > > > > > >> > > > > >> > > deletion > > > > > > > > >> > > > > >> > > > > of > > > > > > > > >> > > > > >> > > > > > > > > > > > configurations? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Yes, since the DEAD state > is an > > > > > > ending > > > > > > > > >> state > > > > > > > > >> > (we > > > > > > > > >> > > > > would > > > > > > > > >> > > > > >> > only > > > > > > > > >> > > > > >> > > > > > > transit to > > > > > > > > >> > > > > >> > > > > > > > > that > > > > > > > > >> > > > > >> > > > > > > > > > > state when the group is > EMPTY > > > and > > > > > > also all > > > > > > > > >> of > > > > > > > > >> > its > > > > > > > > >> > > > > >> > metadata > > > > > > > > >> > > > > >> > > are > > > > > > > > >> > > > > >> > > > > > > gone), > > > > > > > > >> > > > > >> > > > > > > > > once > > > > > > > > >> > > > > >> > > > > > > > > > > it's transited to DEAD this > > > group > > > > > > would > > > > > > > > >> never > > > > > > > > >> > be > > > > > > > > >> > > > > >> revived. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 7. Will the choice to > store > > > the > > > > > > configs > > > > > > > > >> in > > > > > > > > >> > the > > > > > > > > >> > > > > group > > > > > > > > >> > > > > >> > > > > coordinator > > > > > > > > >> > > > > >> > > > > > > > > make it > > > > > > > > >> > > > > >> > > > > > > > > > > > harder to list all > cluster > > > configs > > > > > > and > > > > > > > > >> their > > > > > > > > >> > > > > values? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > That's a good question, > and our > > > > > > thoughts > > > > > > > > >> are > > > > > > > > >> > that > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > so-called > > > > > > > > >> > > > > >> > > > > > > "group > > > > > > > > >> > > > > >> > > > > > > > > > > configurations" are > overrides > > > of the > > > > > > > > >> > cluster-level > > > > > > > > >> > > > > >> > > > > configurations > > > > > > > > >> > > > > >> > > > > > > > > > > customized per group so > when an > > > > > > admin list > > > > > > > > >> > cluster > > > > > > > > >> > > > > >> > configs > > > > > > > > >> > > > > >> > > it's > > > > > > > > >> > > > > >> > > > > > > okay to > > > > > > > > >> > > > > >> > > > > > > > > > > list just the cluster-level > > > > > > "defaults", not > > > > > > > > >> > showing > > > > > > > > >> > > > > >> any > > > > > > > > >> > > > > >> > > > > per-group > > > > > > > > >> > > > > >> > > > > > > > > > > customizations. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 8. How would someone > > > configure a > > > > > > group > > > > > > > > >> > before > > > > > > > > >> > > > > >> starting > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > > > consumers? Have > > > > > > > > >> > > > > >> > > > > > > > > > > > we considered allowing > the > > > explicit > > > > > > > > >> > creation of > > > > > > > > >> > > > > >> groups? > > > > > > > > >> > > > > >> > > > > > > > > Alternatively, the > > > > > > > > >> > > > > >> > > > > > > > > > > > configs could be > decoupled > > > from > > > > > > the group > > > > > > > > >> > > > > lifecycle. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > The configs can be created > > > before > > > > > > the group > > > > > > > > >> > itself > > > > > > > > >> > > > > as > > > > > > > > >> > > > > >> an > > > > > > > > >> > > > > >> > > > > > > independent > > > > > > > > >> > > > > >> > > > > > > > > entity > > > > > > > > >> > > > > >> > > > > > > > > > > --- of course, this > requires > > > the > > > > > > > > >> corresponding > > > > > > > > >> > > > > >> request to > > > > > > > > >> > > > > >> > > be > > > > > > > > >> > > > > >> > > > > > > routed to > > > > > > > > >> > > > > >> > > > > > > > > the > > > > > > > > >> > > > > >> > > > > > > > > > > right coordinator based on > the > > > group > > > > > > id --- > > > > > > > > >> > the only > > > > > > > > >> > > > > >> > thing > > > > > > > > >> > > > > >> > > that > > > > > > > > >> > > > > >> > > > > > > > > differs is, > > > > > > > > >> > > > > >> > > > > > > > > > > when the group itself is > gone > > > we > > > > > > also check > > > > > > > > >> > if there > > > > > > > > >> > > > > >> are > > > > > > > > >> > > > > >> > > any > > > > > > > > >> > > > > >> > > > > > > > > configuration > > > > > > > > >> > > > > >> > > > > > > > > > > entities related to that > group > > > and > > > > > > delete > > > > > > > > >> as > > > > > > > > >> > well. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Admittedly this indeed > > > introduces an > > > > > > > > >> > asymmetry on > > > > > > > > >> > > > > the > > > > > > > > >> > > > > >> > > creation > > > > > > > > >> > > > > >> > > > > / > > > > > > > > >> > > > > >> > > > > > > > > deletion > > > > > > > > >> > > > > >> > > > > > > > > > > lifecycles of the config > > > entities, > > > > > > and we > > > > > > > > >> > would like > > > > > > > > >> > > > > >> to > > > > > > > > >> > > > > >> > > hear > > > > > > > > >> > > > > >> > > > > > > everyone's > > > > > > > > >> > > > > >> > > > > > > > > > > feelings whether we should > aim > > > for > > > > > > symmetry > > > > > > > > >> > i.e. > > > > > > > > >> > > > > >> totally > > > > > > > > >> > > > > >> > > > > decouple > > > > > > > > >> > > > > >> > > > > > > group > > > > > > > > >> > > > > >> > > > > > > > > > > configs and hence not > delete > > > them at > > > > > > all > > > > > > > > >> when > > > > > > > > >> > the > > > > > > > > >> > > > > >> group > > > > > > > > >> > > > > >> > is > > > > > > > > >> > > > > >> > > > > gone, > > > > > > > > >> > > > > >> > > > > > > but > > > > > > > > >> > > > > >> > > > > > > > > always > > > > > > > > >> > > > > >> > > > > > > > > > > require explicit deletion > > > operations > > > > > > by > > > > > > > > >> > themselves. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 9. Will the > > > Consumer.subscribe > > > > > > method for > > > > > > > > >> > the Java > > > > > > > > >> > > > > >> > client > > > > > > > > >> > > > > >> > > > > still > > > > > > > > >> > > > > >> > > > > > > take > > > > > > > > >> > > > > >> > > > > > > > > a > > > > > > > > >> > > > > >> > > > > > > > > > > > > `java.util.regex.Pattern` of > > > do we > > > > > > have > > > > > > > > >> to > > > > > > > > >> > > > > >> introduce an > > > > > > > > >> > > > > >> > > > > overload? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > I think we do not need to > > > introduce > > > > > > an > > > > > > > > >> > overload, but > > > > > > > > >> > > > > >> I'm > > > > > > > > >> > > > > >> > > all > > > > > > > > >> > > > > >> > > > > ears > > > > > > > > >> > > > > >> > > > > > > if > > > > > > > > >> > > > > >> > > > > > > > > there > > > > > > > > >> > > > > >> > > > > > > > > > > may be some compatibility > > > issues > > > > > > that we > > > > > > > > >> may > > > > > > > > >> > > > > overlook. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 10. I agree with Justine > > > that we > > > > > > should > > > > > > > > >> be > > > > > > > > >> > clearer > > > > > > > > >> > > > > >> > about > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > > > reason > > > > > > > > >> > > > > >> > > > > > > > > to > > > > > > > > >> > > > > >> > > > > > > > > > > > switch to > > > IBP/metadata.version > > > > > > from the > > > > > > > > >> > feature > > > > > > > > >> > > > > >> flag. > > > > > > > > >> > > > > >> > > Maybe > > > > > > > > >> > > > > >> > > > > we > > > > > > > > >> > > > > >> > > > > > > mean > > > > > > > > >> > > > > >> > > > > > > > > that we > > > > > > > > >> > > > > >> > > > > > > > > > > > can switch the default > for > > > the > > > > > > feature > > > > > > > > >> flag > > > > > > > > >> > to > > > > > > > > >> > > > > true > > > > > > > > >> > > > > >> > > based on > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > > > > > > metadata.version once we > > > want to > > > > > > make it > > > > > > > > >> the > > > > > > > > >> > > > > >> default. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > 11. Some of the protocol > APIs > > > don't > > > > > > mention > > > > > > > > >> > the > > > > > > > > >> > > > > >> required > > > > > > > > >> > > > > >> > > ACLs, > > > > > > > > >> > > > > >> > > > > it > > > > > > > > >> > > > > >> > > > > > > > > would be > > > > > > > > >> > > > > >> > > > > > > > > > > > good to add that for > > > consistency. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Ack, we can certainly do > that. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 12. It is a bit odd that > > > > > > > > >> > ConsumerGroupHeartbeat > > > > > > > > >> > > > > >> > requires > > > > > > > > >> > > > > >> > > > > "Read > > > > > > > > >> > > > > >> > > > > > > > > Group" even > > > > > > > > >> > > > > >> > > > > > > > > > > > though it seems to do > more > > > than > > > > > > reading. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > I had that thought myself > as > > > well, > > > > > > but in > > > > > > > > >> the > > > > > > > > >> > end we > > > > > > > > >> > > > > >> > could > > > > > > > > >> > > > > >> > > not > > > > > > > > >> > > > > >> > > > > > > find a > > > > > > > > >> > > > > >> > > > > > > > > > > better alternative: adding > > > Write > > > > > > Group > > > > > > > > >> seems > > > > > > > > >> > an > > > > > > > > >> > > > > >> overkill > > > > > > > > >> > > > > >> > > here > > > > > > > > >> > > > > >> > > > > > > since we > > > > > > > > >> > > > > >> > > > > > > > > do > > > > > > > > >> > > > > >> > > > > > > > > > > not have it elsewhere (we > only > > > have > > > > > > Read / > > > > > > > > >> > Delete > > > > > > > > >> > > > > and > > > > > > > > >> > > > > >> > > Describe > > > > > > > > >> > > > > >> > > > > on > > > > > > > > >> > > > > >> > > > > > > > > groups so > > > > > > > > >> > > > > >> > > > > > > > > > > far). Would like to hear > others > > > > > > thoughts. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 13. How is topic > recreation > > > > > > handled by > > > > > > > > >> the > > > > > > > > >> > > > > consumer > > > > > > > > >> > > > > >> > with > > > > > > > > >> > > > > >> > > the > > > > > > > > >> > > > > >> > > > > new > > > > > > > > >> > > > > >> > > > > > > > > group > > > > > > > > >> > > > > >> > > > > > > > > > > > protocol? It would be > good > > > to have > > > > > > a > > > > > > > > >> > section on > > > > > > > > >> > > > > >> this. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > You mean with regex > > > subscription > > > > > > right? Yes > > > > > > > > >> > we can > > > > > > > > >> > > > > >> add a > > > > > > > > >> > > > > >> > > > > section > > > > > > > > >> > > > > >> > > > > > > about > > > > > > > > >> > > > > >> > > > > > > > > > > that, but basically the > idea > > > is that > > > > > > > > >> consumer > > > > > > > > >> > would > > > > > > > > >> > > > > be > > > > > > > > >> > > > > >> > > totally > > > > > > > > >> > > > > >> > > > > > > > > agnostic in > > > > > > > > >> > > > > >> > > > > > > > > > > the new protocol as it's > > > handled all > > > > > > by the > > > > > > > > >> > brokers. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 14. The KIP mentions we > will > > > write > > > > > > the > > > > > > > > >> new > > > > > > > > >> > > > > >> coordinator > > > > > > > > >> > > > > >> > in > > > > > > > > >> > > > > >> > > > > Java. > > > > > > > > >> > > > > >> > > > > > > Even > > > > > > > > >> > > > > >> > > > > > > > > though > > > > > > > > >> > > > > >> > > > > > > > > > > > this is an implementation > > > detail, > > > > > > do we > > > > > > > > >> > plan to > > > > > > > > >> > > > > >> have a > > > > > > > > >> > > > > >> > > new > > > > > > > > >> > > > > >> > > > > gradle > > > > > > > > >> > > > > >> > > > > > > > > module > > > > > > > > >> > > > > >> > > > > > > > > > > > for it? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > We have not thought about > > > that. But > > > > > > I think > > > > > > > > >> > the > > > > > > > > >> > > > > answer > > > > > > > > >> > > > > >> > > should > > > > > > > > >> > > > > >> > > > > be > > > > > > > > >> > > > > >> > > > > > > yes. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 15. Do we have a > scalability > > > goal > > > > > > when it > > > > > > > > >> > comes to > > > > > > > > >> > > > > >> how > > > > > > > > >> > > > > >> > > many > > > > > > > > >> > > > > >> > > > > > > members > > > > > > > > >> > > > > >> > > > > > > > > the new > > > > > > > > >> > > > > >> > > > > > > > > > > > group protocol can > support? > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > Within a group, I think we > > > should > > > > > > shoot for > > > > > > > > >> > 1000s of > > > > > > > > >> > > > > >> > > members. > > > > > > > > >> > > > > >> > > > > But > > > > > > > > >> > > > > >> > > > > > > that > > > > > > > > >> > > > > >> > > > > > > > > > > scalability goals also > depend > > > on the > > > > > > offset > > > > > > > > >> > > > > management > > > > > > > > >> > > > > >> > > (commit, > > > > > > > > >> > > > > >> > > > > > > fetch) > > > > > > > > >> > > > > >> > > > > > > > > > > capabilities of the > coordinator > > > > > > which we > > > > > > > > >> did > > > > > > > > >> > not > > > > > > > > >> > > > > >> cover in > > > > > > > > >> > > > > >> > > this > > > > > > > > >> > > > > >> > > > > > > KIP, so > > > > > > > > >> > > > > >> > > > > > > > > it's > > > > > > > > >> > > > > >> > > > > > > > > > > hard to give a number that > > > applies > > > > > > > > >> > universally. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > 16. Did we consider > having > > > > > > > > >> > SubscribedTopidIds > > > > > > > > >> > > > > >> instead > > > > > > > > >> > > > > >> > of > > > > > > > > >> > > > > >> > > > > > > > > > > > SubscribedTopicNames in > > > > > > > > >> > > > > >> ConsumerGroupHeartbeatRequest? > > > > > > > > >> > > > > >> > > Is the > > > > > > > > >> > > > > >> > > > > > > idea > > > > > > > > >> > > > > >> > > > > > > > > that > > > > > > > > >> > > > > >> > > > > > > > > > > > since we have to resolve > the > > > regex > > > > > > on the > > > > > > > > >> > server, > > > > > > > > >> > > > > we > > > > > > > > >> > > > > >> > can > > > > > > > > >> > > > > >> > > do > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > same > > > > > > > > >> > > > > >> > > > > > > > > for > > > > > > > > >> > > > > >> > > > > > > > > > > > the topic name? The > > > difference is > > > > > > that > > > > > > > > >> > sending the > > > > > > > > >> > > > > >> > regex > > > > > > > > >> > > > > >> > > is > > > > > > > > >> > > > > >> > > > > more > > > > > > > > >> > > > > >> > > > > > > > > efficient > > > > > > > > >> > > > > >> > > > > > > > > > > > whereas sending the topic > > > names is > > > > > > less > > > > > > > > >> > efficient. > > > > > > > > >> > > > > >> > > > > Furthermore, > > > > > > > > >> > > > > >> > > > > > > > > delete and > > > > > > > > >> > > > > >> > > > > > > > > > > > recreation is easier to > > > handle if > > > > > > we have > > > > > > > > >> > topic > > > > > > > > >> > > > > ids. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > The main reason to still > let > > > the > > > > > > clients > > > > > > > > >> send > > > > > > > > >> > names > > > > > > > > >> > > > > >> is to > > > > > > > > >> > > > > >> > > keep > > > > > > > > >> > > > > >> > > > > the > > > > > > > > >> > > > > >> > > > > > > > > > > reasoning of names -> ids > on > > > the > > > > > > broker / > > > > > > > > >> > admin > > > > > > > > >> > > > > client > > > > > > > > >> > > > > >> > > only. > > > > > > > > >> > > > > >> > > > > Note > > > > > > > > >> > > > > >> > > > > > > that > > > > > > > > >> > > > > >> > > > > > > > > > > although we added topic id > in > > > > > > KIP-516, we > > > > > > > > >> > never > > > > > > > > >> > > > > >> > > implemented the > > > > > > > > >> > > > > >> > > > > > > logic > > > > > > > > >> > > > > >> > > > > > > > > on > > > > > > > > >> > > > > >> > > > > > > > > > > consumer/producers > leveraging > > > the > > > > > > related > > > > > > > > >> > newer > > > > > > > > >> > > > > >> versioned > > > > > > > > >> > > > > >> > > RPCs, > > > > > > > > >> > > > > >> > > > > > > > > instead we > > > > > > > > >> > > > > >> > > > > > > > > > > just set the topic id as > empty > > > UUID. > > > > > > We > > > > > > > > >> want > > > > > > > > >> > to keep > > > > > > > > >> > > > > >> the > > > > > > > > >> > > > > >> > > > > > > > > consumer/producer > > > > > > > > >> > > > > >> > > > > > > > > > > to be thin and only > delegate > > > the > > > > > > reasoning > > > > > > > > >> on > > > > > > > > >> > broker > > > > > > > > >> > > > > >> and > > > > > > > > >> > > > > >> > > > > > > potentially > > > > > > > > >> > > > > >> > > > > > > > > admin > > > > > > > > >> > > > > >> > > > > > > > > > > clients. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > Thanks, > > > > > > > > >> > > > > >> > > > > > > > > > > > Ismael > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > On Wed, Jul 6, 2022 at > 10:45 > > > AM > > > > > > David > > > > > > > > >> Jacot > > > > > > > > >> > > > > >> > > > > > > > > <dja...@confluent.io.invalid> > > > > > > > > >> > > > > >> > > > > > > > > > > > wrote: > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > Hi all, > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > I would like to start a > > > > > > discussion > > > > > > > > >> thread > > > > > > > > >> > on > > > > > > > > >> > > > > >> KIP-848: > > > > > > > > >> > > > > >> > > The > > > > > > > > >> > > > > >> > > > > Next > > > > > > > > >> > > > > >> > > > > > > > > > > > > Generation of the > Consumer > > > > > > Rebalance > > > > > > > > >> > Protocol. > > > > > > > > >> > > > > >> With > > > > > > > > >> > > > > >> > > this > > > > > > > > >> > > > > >> > > > > KIP, > > > > > > > > >> > > > > >> > > > > > > we > > > > > > > > >> > > > > >> > > > > > > > > aim > > > > > > > > >> > > > > >> > > > > > > > > > > > > to make the rebalance > > > protocol > > > > > > (for > > > > > > > > >> > consumers) > > > > > > > > >> > > > > >> more > > > > > > > > >> > > > > >> > > > > reliable, > > > > > > > > >> > > > > >> > > > > > > more > > > > > > > > >> > > > > >> > > > > > > > > > > > > scalable, easier to > > > implement for > > > > > > > > >> > clients, and > > > > > > > > >> > > > > >> easier > > > > > > > > >> > > > > >> > > to > > > > > > > > >> > > > > >> > > > > debug > > > > > > > > >> > > > > >> > > > > > > for > > > > > > > > >> > > > > >> > > > > > > > > > > > > operators. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > The KIP is here: > > > > > > > > >> > > > > >> > > > > > > > https://cwiki.apache.org/confluence/x/HhD1D. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > Please take a look and > let > > > me > > > > > > know what > > > > > > > > >> > you > > > > > > > > >> > > > > think. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > Best, > > > > > > > > >> > > > > >> > > > > > > > > > > > > David > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > PS: I will be away from > > > July > > > > > > 18th to > > > > > > > > >> > August 8th. > > > > > > > > >> > > > > >> That > > > > > > > > >> > > > > >> > > gives > > > > > > > > >> > > > > >> > > > > > > you a > > > > > > > > >> > > > > >> > > > > > > > > bit > > > > > > > > >> > > > > >> > > > > > > > > > > > > of time to read and > digest > > > this > > > > > > long > > > > > > > > >> KIP. > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > -- > > > > > > > > >> > > > > >> > > > > > > > > > > -- Guozhang > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > > >> > > > > >> > > > > > > > -- > > > > > > > > >> > > > > >> > > > > > > > -- Guozhang > > > > > > > > >> > > > > >> > > > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > > >> > > > > >> > > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> -- > > > > > > > > >> > > > > >> -- Guozhang > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > -- > > -- Guozhang > -- -- Guozhang