Hi, Thanks for the update. I have a few nits:
> If the member ID is null or empty, the server will reject the request with an InvalidRequestException. We should clarify that this should only apply to version >= 1. > The consumer instance must generate a member ID, and this ID should remain consistent for the duration of the consumer's session. Here, a "session" is defined as the period from the consumer's first heartbeat until it leaves the group, either through a graceful shutdown, a heartbeat timeout, or the process stopping or dying. The consumer instance should reuse the same member ID for all heartbeats and rejoin attempts to maintain continuity within the group. This part is not clear to me. When the member leaves the group, it should not reset the member id. I would rather say that the member must generate its member id when it starts and it must keep it until the process stops. It is basically an incarnation of the process. > If a conflict arises where the member ID generated by the client is detected to be a duplicate within the same group (for example, the same member ID is associated with another active member in the group), the server will handle this by comparing the memberEpoch values of the conflicting members. The member with the lower memberEpoch is considered outdated and will be fenced off by the server. When this occurs, the server responds with a FENCED_MEMBER_EPOCH error to the client, signaling it to rejoin the group with the same member ID while resetting the memberEpoch to zero. This ensures that the client properly resynchronizes and maintains the continuity and consistency of the group membership. This part is not clear either. It basically says that if a member joins with an existing member id but a different epoch, it will be fenced. Then it must rejoin with the same member id and epoch zero. This is already the current behavior and it does not help with detecting duplicates, right? Should we just remove the paragraph? > A member ID mismatch occurs within a session: If the server detects a mismatch between the provided member ID and the expected member ID for an ongoing session, it should return a UNKNOWN_MEMBER_ID error. How could we detect a mismatch between the provided and the expected member id? My understanding is that we can only know whether the provided member id exists or not. This is already implemented. Thanks, David On Sat, Sep 14, 2024 at 9:31 AM TengYao Chi <kiting...@gmail.com> wrote: > Hello everyone, > > Since this KIP has been fully discussed, I will initiate a vote for it next > Monday. > Thank you and have a nice weekend. > > Best regards, > TengYao > > TengYao Chi <kiting...@gmail.com> 於 2024年9月5日 週四 下午2:19寫道: > > > Hello everyone, > > > > KT2: It looks like everyone who has expressed an opinion supports the > > second option: “Document a recommendation for clients to use UUIDs as > > member IDs, without strictly enforcing it.” > > I have updated the KIP accordingly. > > Please take a look, and let me know if you have any thoughts or feedback. > > > > Thank you! > > > > Best regards, > > TengYao > > > > Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月30日 週五 下午9:56寫道: > > > >> hi TengYao > >> > >> KT2: +1 to second approach > >> > >> Best, > >> Chia-Ping > >> > >> > >> David Jacot <dja...@confluent.io.invalid> 於 2024年8月30日 週五 下午9:15寫道: > >> > >> > Hi TengYao, > >> > > >> > KT2: I don't think that we can realistically validate the uuid on the > >> > server. It is basically a string of chars. So I lean towards having a > >> good > >> > recommendation in the KIP and in the document of the field in the > RPC's > >> > definition. > >> > > >> > Best, > >> > David > >> > > >> > On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi <kiting...@gmail.com> > >> wrote: > >> > > >> > > Hello Kirk ! > >> > > > >> > > Thank you for your comments ! > >> > > > >> > > KT1: Yes, you are correct. The issue is not unique to the initial > >> > > heartbeat; there can always be cases where the broker might lose > >> > connection > >> > > with a member. > >> > > > >> > > KT2: Currently, if the client doesn't have a member ID and the > >> > memberEpoch > >> > > equals 0, the coordinator will generate a UUID as the member ID for > >> the > >> > > client. However, at the RPC level, the member ID is sent as a > literal > >> > > string, meaning there are no restrictions on the format at this > level. > >> > > This also reminds me that we haven't reached a final conclusion on > >> how to > >> > > enforce the use of UUIDs. > >> > > From our previous discussions, I recall two possible approaches: > >> > > The first is to validate the UUID on the server side, and if it's > not > >> > > valid, throw an exception to the client. > >> > > The second is to document a recommendation for clients to use UUIDs > as > >> > > member IDs, without strictly enforcing it. > >> > > I think it's time to decide on the approach we want to take. > >> > > > >> > > KT3: Yes, "session" can be considered synonymous with "membership" > in > >> > this > >> > > context. > >> > > > >> > > KT4: Thank you for pointing that out. I will update the wording to > >> > > specifically say this behavior is for consumers. > >> > > > >> > > Thanks again for your comments. > >> > > > >> > > Best regards, > >> > > TengYao > >> > > > >> > > Kirk True <k...@kirktrue.pro> 於 2024年8月30日 週五 上午12:39寫道: > >> > > > >> > > > Hi TengYao! > >> > > > > >> > > > Sorry for being late to the discussion... > >> > > > > >> > > > After reading the thread and then the KIP, I had a few > >> > > questions/comments: > >> > > > > >> > > > KT1: In Motivation, it states: "This scenario can result in the > >> broker > >> > > > registering a new member for which it will never receive a proper > >> leave > >> > > > request.” Just to be clear, the broker will always have cases > where > >> it > >> > > > might lose connection with a member. That’s not unique to the > >> initial > >> > > > heartbeat, right? > >> > > > > >> > > > KT2: There was a bit of back and forth about format of the member > >> ID. > >> > > From > >> > > > what I gathered in the thread, the member ID is still defined in > the > >> > RPC > >> > > as > >> > > > a string and not a UUID, right? The KIP states that the “client > must > >> > > > generate a UUID as the member ID” and that the “server will > validate > >> > > that a > >> > > > valid UUID is provided.” Is that a change for the server, or is it > >> > > already > >> > > > enforced as a UUID? > >> > > > > >> > > > KT3: Lianet mentioned some confusion over the use of the word > >> > “session.” > >> > > > Isn’t “session” synonymous with “membership?” > >> > > > > >> > > > KT4: Under “Member ID Lifecycle,” it states: "The client should > >> reuse > >> > the > >> > > > same UUID as the member ID for all heartbeats and rejoin attempts > to > >> > > > maintain continuity within the group.” Could we change the first > >> part > >> > of > >> > > > that to “The Consumer instance should…” We do have lifetimes that > >> > extend > >> > > > past the lifetime of a client instance (such as the transaction > ID). > >> > > > > >> > > > Thanks, > >> > > > Kirk > >> > > > > >> > > > > On Aug 29, 2024, at 1:28 AM, TengYao Chi <kiting...@gmail.com> > >> > wrote: > >> > > > > > >> > > > > Hi David, > >> > > > > > >> > > > > Thank you for pointing that out. > >> > > > > I have updated the content of the KIP based on Lianet's and your > >> > > > feedback. > >> > > > > Please take a look and let me know your thoughts. > >> > > > > > >> > > > > Best regards, > >> > > > > TengYao > >> > > > > > >> > > > > David Jacot <dja...@confluent.io.invalid> 於 2024年8月29日 週四 > >> 下午3:20寫道: > >> > > > > > >> > > > >> Hi TengYao, > >> > > > >> > >> > > > >> Thanks for the update. I haven't fully read it yet but I will > >> soon. > >> > > > >> > >> > > > >> LM4: This is incorrect. The consumer must keep its member id > >> during > >> > > its > >> > > > >> entire lifetime (until the process stops or dies). The protocol > >> > > > stipulates > >> > > > >> that a member must rejoin with the same member id and the > member > >> > epoch > >> > > > set > >> > > > >> to zero when an FENCED_MEMBER_EPOCH occurs. This allows the > >> member > >> > to > >> > > > >> resynchronize itself. We should not change this behavior. I > think > >> > that > >> > > > we > >> > > > >> should see the client side generation id as an incarnation id > of > >> the > >> > > > >> application. It is generated once and kept until it stops or > >> dies. > >> > > > >> > >> > > > >> Best, > >> > > > >> David > >> > > > >> > >> > > > >> On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi < > kiting...@gmail.com > >> > > >> > > > wrote: > >> > > > >> > >> > > > >>> Hello Lianet ! > >> > > > >>> > >> > > > >>> Thanks for the reviews and suggestions! > >> > > > >>> > >> > > > >>> LM1: Indeed, we plan to enforce client-side ID generation in > the > >> > > > future, > >> > > > >>> and it is not an alternative. I will change the title > >> accordingly. > >> > > > >>> > >> > > > >>> LM2: Yes, that's the expectation. I will add that statement to > >> the > >> > > > public > >> > > > >>> interface section. > >> > > > >>> > >> > > > >>> LM3: Thank you for the high-level perspective review. I think > >> > you're > >> > > > >> right; > >> > > > >>> our intention isn't very clear since it was placed at the end > of > >> > the > >> > > > >>> section. I will try to rephrase that section to make it more > >> > obvious. > >> > > > >>> > >> > > > >>> LM4: Regarding the definition of "session" in this KIP, I > >> believe > >> > it > >> > > > >> refers > >> > > > >>> to the period between the *first-time heartbeat* and when the > >> > > *consumer > >> > > > >>> leaves the group* (whether through a graceful shutdown or a > >> > heartbeat > >> > > > >>> timeout). The consumer should reuse its UUID if it has been > >> > generated > >> > > > >>> before. The only situation in which it will regenerate the > UUID > >> is > >> > if > >> > > > the > >> > > > >>> coordinator finds that there is already a consumer with the > same > >> > > UUID. > >> > > > >>> IIRC, the coordinator should compare the member epochs, and > the > >> > > > >>> later-joined consumer should be fenced off by the coordinator > >> due > >> > to > >> > > > >> having > >> > > > >>> a lower member epoch. Once the consumer receives a > >> > > > `FENCED_MEMBER_EPOCH` > >> > > > >>> error, it will generate a new UUID and attempt to rejoin. I > will > >> > > > clarify > >> > > > >>> this in the KIP. > >> > > > >>> > >> > > > >>> Thanks again for your reviews, I really appreciate it. > >> > > > >>> > >> > > > >>> Best regards, > >> > > > >>> TengYao > >> > > > >>> > >> > > > >>> Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 下午7:12寫道: > >> > > > >>> > >> > > > >>>> Hello TengYao! Thanks for taking on this issue, we've been > >> going > >> > > > around > >> > > > >>> it > >> > > > >>>> for a while. > >> > > > >>>> > >> > > > >>>> LM1: About the title of the KIP: "Enable ID Generation for > >> Clients > >> > > > over > >> > > > >>> the > >> > > > >>>> ConsumerGroupHeartbeat RPC". I find it confusing because it > >> hints > >> > > that > >> > > > >>>> we're adding it as an alternative (which was discussed and > >> > > discarded, > >> > > > >> in > >> > > > >>>> favour of really enforcing it). It's also missing the core > >> change > >> > > imo, > >> > > > >>>> which is "where" the generation happens. So, maybe more to > the > >> > point > >> > > > >> with > >> > > > >>>> something along the lines of "Client-side generated ID for > >> clients > >> > > > over > >> > > > >>>> ConsumerGroupHeartbeat RPC"? > >> > > > >>>> > >> > > > >>>> LM2: On the public interfaces section, the KIP states that > "the > >> > > server > >> > > > >>> will > >> > > > >>>> reject the request", but we should agree on the specific > error > >> > > type. I > >> > > > >>>> expect it should fail with an InvalidRequestException, is > that > >> the > >> > > > >>>> intention? (This was also suggested in the discussion thread > >> > before > >> > > > but > >> > > > >>> is > >> > > > >>>> not in the KIP). > >> > > > >>>> > >> > > > >>>> LM3. Related to my previous point, I find that to be the true > >> > > > >>> public-facing > >> > > > >>>> change (member ID mandatory at the protocol level), but it's > >> only > >> > at > >> > > > >> the > >> > > > >>>> end of the Public interfaces changes, kind of lost among > >> details > >> > of > >> > > > how > >> > > > >>>> we're going to do it. Should we rephrase that section with > the > >> > > actual > >> > > > >>>> change first, and the hows after (ex. Bumping the version is > >> not > >> > the > >> > > > >>>> public-facing change in this case, it's just the mechanism to > >> > > properly > >> > > > >>>> introduce our change) > >> > > > >>>> > >> > > > >>>> LM4. Regarding the lifetime of the UUID: the KIP states we > will > >> > > > "Verify > >> > > > >>>> that the UUID remains consistent across all subsequent > >> heartbeats > >> > > > >> during > >> > > > >>>> the session". What is this "session" referring to here? I > would > >> > > expect > >> > > > >>> that > >> > > > >>>> the UUID is associated to a consumer instance (generated for > >> the > >> > > > >> consumer > >> > > > >>>> the first time it needs to send a HB if it doesn't have the > >> UUID > >> > > yet. > >> > > > >>> From > >> > > > >>>> there on, every time it needs to send a "first HB" again, it > >> will > >> > > > reuse > >> > > > >>> its > >> > > > >>>> UUID, is that the intention? Note that we should consider > that > >> the > >> > > > same > >> > > > >>>> consumer instance may have many "first heartbeats", meaning > >> > > heartbeats > >> > > > >> to > >> > > > >>>> join the group when it's not part of it (ex. consumer > >> unsubscribe > >> > + > >> > > > >>>> subscribe, fenced, stale). Is this the intention or are you > >> > > > considering > >> > > > >>> the > >> > > > >>>> lifetime differently? We should clarify it in the KIP. > >> > > > >>>> > >> > > > >>>> Thanks! > >> > > > >>>> > >> > > > >>>> Lianet > >> > > > >>>> > >> > > > >>>> On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi < > >> kiting...@gmail.com> > >> > > > >> wrote: > >> > > > >>>> > >> > > > >>>>> Hi everyone, > >> > > > >>>>> > >> > > > >>>>> I have revised this KIP multiple times based on the feedback > >> from > >> > > our > >> > > > >>>>> discussions. > >> > > > >>>>> I would greatly appreciate it if you could review it when > you > >> > have > >> > > > >> the > >> > > > >>>>> time. > >> > > > >>>>> If there are no further comments or suggestions, I plan to > >> > proceed > >> > > > >> with > >> > > > >>>>> initiating a vote soon. > >> > > > >>>>> > >> > > > >>>>> Best regards, > >> > > > >>>>> TengYao > >> > > > >>>>> > >> > > > >>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道: > >> > > > >>>>> > >> > > > >>>>>> Hi Andrew, > >> > > > >>>>>> Thank you for your previous feedback and insights. > >> > > > >>>>>> Your contributions have added valuable perspectives to the > >> > > > >>> discussions. > >> > > > >>>>>> And we also benefit from the comparison of different > >> solutions. > >> > > > >>>>>> I’m also looking forward to seeing an initial version in > >> > KIP-932, > >> > > > >> as > >> > > > >>> it > >> > > > >>>>>> will provide a good reference for future implementations. > >> > > > >>>>>> > >> > > > >>>>>> Regarding your comment on AS2, I wanted to clarify that my > >> > > > >>>> specification > >> > > > >>>>>> references org.apache.kafka.common.Uuid. > >> > > > >>>>>> I believe we’re referring to the same class, and it might > >> just > >> > be > >> > > a > >> > > > >>>> small > >> > > > >>>>>> oversight due to the busy schedule. > >> > > > >>>>>> > >> > > > >>>>>> I want to express my gratitude once again for your many > >> > insightful > >> > > > >>>>>> comments, which have helped the discussion progress > smoothly. > >> > > > >>>>>> > >> > > > >>>>>> Best regards, > >> > > > >>>>>> TengYao > >> > > > >>>>>> > >> > > > >>>>>> > >> > > > >>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 > 週四 > >> > > > >>>> 下午11:28寫道: > >> > > > >>>>>> > >> > > > >>>>>>> Hi TengYao, > >> > > > >>>>>>> I’ve been reading through the comments and I’m happy that > >> the > >> > > > >> lobby > >> > > > >>>>>>> approach has not gained support. > >> > > > >>>>>>> > >> > > > >>>>>>> Assuming that this KIP is voted, I will be happy to change > >> > > KIP-932 > >> > > > >>> so > >> > > > >>>>>>> that it only supports client-side member ID generation. > >> Because > >> > > > >> that > >> > > > >>>> KIP > >> > > > >>>>>>> is still > >> > > > >>>>>>> under development, I can do this in the first version of > >> > > > >>>>>>> ShareGroupHeartbeat. > >> > > > >>>>>>> > >> > > > >>>>>>> AS2: For the encoding section, I suppose the specific > >> encoding > >> > > > >> which > >> > > > >>>>>>> is used is what org.apache.kafka.utils.Uuid uses. > >> > > > >>>>>>> > >> > > > >>>>>>> Thanks, > >> > > > >>>>>>> Andrew > >> > > > >>>>>>> > >> > > > >>>>>>>> On 14 Aug 2024, at 17:00, TengYao Chi < > kiting...@gmail.com > >> > > >> > > > >>> wrote: > >> > > > >>>>>>>> > >> > > > >>>>>>>> Hello Apoorv, > >> > > > >>>>>>>> Thank you for your feedback. > >> > > > >>>>>>>> Regarding the questions you raised, unfortunately, this > KIP > >> > > > >> cannot > >> > > > >>>>>>>> guarantee the order of heartbeats. As with many classic > >> > > > >>> distributed > >> > > > >>>>>>> system > >> > > > >>>>>>>> challenges, what we can do is make our best effort to > >> ensure > >> > > > >> that > >> > > > >>>>> there > >> > > > >>>>>>> are > >> > > > >>>>>>>> no idle members or stale assignments under normal > >> > circumstances. > >> > > > >>>>>>>> > >> > > > >>>>>>>> As for the lobby approach, I’m not a fan of it because it > >> > > > >> requires > >> > > > >>>>>>> adding a > >> > > > >>>>>>>> mechanism to maintain client state within the > >> ConsumerGroup, > >> > > > >>> which, > >> > > > >>>> in > >> > > > >>>>>>> my > >> > > > >>>>>>>> view, resembles something like a two-phase commit. This > >> would > >> > > > >>>>> introduce > >> > > > >>>>>>>> more complexity than the proposal in this KIP, which is > >> > > > >> something > >> > > > >>> we > >> > > > >>>>>>> want > >> > > > >>>>>>>> to avoid. KIP-848 aims to simplify the existing protocol, > >> and > >> > > > >>> while > >> > > > >>>>> the > >> > > > >>>>>>>> lobby approach is a good one, I believe it is not the > right > >> > fit > >> > > > >>> for > >> > > > >>>>> this > >> > > > >>>>>>>> particular situation. > >> > > > >>>>>>>> > >> > > > >>>>>>>> Best regards, > >> > > > >>>>>>>> TengYao > >> > > > >>>>>>>> > >> > > > >>>>>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 > >> 下午11:45寫道: > >> > > > >>>>>>>> > >> > > > >>>>>>>>> Hi David, > >> > > > >>>>>>>>> > >> > > > >>>>>>>>> I really appreciate your review and suggestions. As I am > >> > still > >> > > > >>>>> gaining > >> > > > >>>>>>>>> experience in writing KIPs, your input has been > incredibly > >> > > > >>>> helpful. I > >> > > > >>>>>>> am > >> > > > >>>>>>>>> currently applying your suggestions to the KIP and will > >> > > > >> complete > >> > > > >>> it > >> > > > >>>>> as > >> > > > >>>>>>> soon > >> > > > >>>>>>>>> as possible. > >> > > > >>>>>>>>> Regarding the UUID part, I think we haven’t reached a > >> > > > >> conclusion > >> > > > >>>>>>> yet.(So > >> > > > >>>>>>>>> far according to this thread) > >> > > > >>>>>>>>> However, I will review the current implementation in the > >> > Kafka > >> > > > >>>> `Uuid` > >> > > > >>>>>>>>> class and include a brief specification in the KIP. > >> > > > >>>>>>>>> > >> > > > >>>>>>>>> Once again, thank you so much for your help. > >> > > > >>>>>>>>> > >> > > > >>>>>>>>> Best regards, > >> > > > >>>>>>>>> TengYao > >> > > > >>>>>>>>> > >> > > > >>>>>>>>> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 > >> > 下午11:14寫道: > >> > > > >>>>>>>>> > >> > > > >>>>>>>>>> hi Apoorv > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>>>> As the memberId is now known to the client, and client > >> > might > >> > > > >>> send > >> > > > >>>>> the > >> > > > >>>>>>>>>> leave > >> > > > >>>>>>>>>> group heartbeat on shutdown prior to receiving the > >> initial > >> > > > >>>> heartbeat > >> > > > >>>>>>>>>> response. If that's true then how do we guarantee that > >> the 2 > >> > > > >>>>> requests > >> > > > >>>>>>> to > >> > > > >>>>>>>>>> join and leave will be processed in order, which could > >> still > >> > > > >>> leave > >> > > > >>>>>>> stale > >> > > > >>>>>>>>>> members or throw unknown member id exceptions? > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>>> This is definitely a good question. the short answer: > no > >> > > > >>> guarantee > >> > > > >>>>> but > >> > > > >>>>>>>>>> best > >> > > > >>>>>>>>>> efforts > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>>> Please notice the root cause is "we have no enough time > >> to > >> > > > >> wait > >> > > > >>>>>>> member id > >> > > > >>>>>>>>>> (response) when closing consumer". Sadly, we can' > >> guarantee > >> > > > >> the > >> > > > >>>>>>> request > >> > > > >>>>>>>>>> order due to the same reason. > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>>> However, in contrast to previous behavior, there is one > >> big > >> > > > >>>> benefit > >> > > > >>>>>>> of new > >> > > > >>>>>>>>>> approach - we can try STONITH because we know the > member > >> id > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>>> Best, > >> > > > >>>>>>>>>> Chia-Ping > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 > 週三 > >> > > > >>>> 下午8:55寫道: > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>>>> Hi TengYao, > >> > > > >>>>>>>>>>> Thanks for the KIP. Continuing on the point which > Andrew > >> > > > >>>> mentioned > >> > > > >>>>> as > >> > > > >>>>>>>>>> AS1. > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>>> As the memberId is now known to the client, and client > >> > might > >> > > > >>> send > >> > > > >>>>> the > >> > > > >>>>>>>>>> leave > >> > > > >>>>>>>>>>> group heartbeat on shutdown prior to receiving the > >> initial > >> > > > >>>>> heartbeat > >> > > > >>>>>>>>>>> response. If that's true then how do we guarantee that > >> the > >> > 2 > >> > > > >>>>>>> requests to > >> > > > >>>>>>>>>>> join and leave will be processed in order, which could > >> > still > >> > > > >>>> leave > >> > > > >>>>>>> stale > >> > > > >>>>>>>>>>> members or throw unknown member id exceptions? > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>>> Though the client side member id generation is helpful > >> > which > >> > > > >>> will > >> > > > >>>>>>>>>> represent > >> > > > >>>>>>>>>>> the same group perspective as from client and broker's > >> end. > >> > > > >>> But I > >> > > > >>>>>>> think > >> > > > >>>>>>>>>> the > >> > > > >>>>>>>>>>> major concern we want to solve here is Stale Partition > >> > > > >>>> Assignments > >> > > > >>>>>>> which > >> > > > >>>>>>>>>>> might still exist with the new approach. I am leaning > >> > towards > >> > > > >>> the > >> > > > >>>>>>>>>>> suggestion mentioned by Andrew where partition > >> assignment > >> > > > >>>> triggers > >> > > > >>>>> on > >> > > > >>>>>>>>>>> subsequent heartbeat when client acknowledges the > >> initial > >> > > > >>>>> heartbeat, > >> > > > >>>>>>>>>>> delayed partition assignment. > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>>> Though on a separate note, I have a different > question. > >> > What > >> > > > >>>>> happens > >> > > > >>>>>>>>>> when > >> > > > >>>>>>>>>>> there is an issue with the client which sends the > >> initial > >> > > > >>>> heartbeat > >> > > > >>>>>>>>>> without > >> > > > >>>>>>>>>>> memberId, then crashes and restarts? I think we must > be > >> > > > >>>>> re-triggering > >> > > > >>>>>>>>>>> assignments and expiring members only after the > >> heartbeat > >> > > > >>> session > >> > > > >>>>>>>>>> timeout? > >> > > > >>>>>>>>>>> If that's true then shall delayed partition assignment > >> can > >> > > > >> help > >> > > > >>>>>>> benefit > >> > > > >>>>>>>>>> us > >> > > > >>>>>>>>>>> from this situation as well? > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>>> Regards, > >> > > > >>>>>>>>>>> Apoorv Mittal > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot > >> > > > >>>>>>>>>> <dja...@confluent.io.invalid> > >> > > > >>>>>>>>>>> wrote: > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>>>> Hi Andrew, > >> > > > >>>>>>>>>>>> > >> > > > >>>>>>>>>>>> Personally, I don't like the lobby approach. It makes > >> > things > >> > > > >>>> more > >> > > > >>>>>>>>>>>> complicated and it would require changing the records > >> on > >> > the > >> > > > >>>>> server > >> > > > >>>>>>>>>> too. > >> > > > >>>>>>>>>>>> This is why I initially suggested the rejected > >> alternative > >> > > > >> #2 > >> > > > >>>>> which > >> > > > >>>>>>> is > >> > > > >>>>>>>>>>>> pretty close but also not perfect. > >> > > > >>>>>>>>>>>> > >> > > > >>>>>>>>>>>> I'd like to clarify one thing. The > >> ConsumerGroupHeartbeat > >> > > > >> API > >> > > > >>>>>>> already > >> > > > >>>>>>>>>>>> supports generating the member id on the client so we > >> > don't > >> > > > >>> need > >> > > > >>>>> any > >> > > > >>>>>>>>>>>> conditional logic on the client side. This is > actually > >> > what > >> > > > >> we > >> > > > >>>>>>> wanted > >> > > > >>>>>>>>>> to > >> > > > >>>>>>>>>>> do > >> > > > >>>>>>>>>>>> in the first place but the idea got pushed back by > >> Magnus > >> > > > >> back > >> > > > >>>>> then > >> > > > >>>>>>>>>>> because > >> > > > >>>>>>>>>>>> generating uuid from librdkafka required a new > >> dependency. > >> > > > >> It > >> > > > >>>>> turns > >> > > > >>>>>>>>>> out > >> > > > >>>>>>>>>>>> that librdkafka has that dependency today. In > >> retrospect, > >> > we > >> > > > >>>>> should > >> > > > >>>>>>>>>> have > >> > > > >>>>>>>>>>>> pushed back on this. Long story short, we can just do > >> it. > >> > > > >> The > >> > > > >>>>>>>>>> proposal in > >> > > > >>>>>>>>>>>> this KIP is to make the member id required in future > >> > > > >> versions. > >> > > > >>>> We > >> > > > >>>>>>>>>> could > >> > > > >>>>>>>>>>>> also decide not to do it and to keep supporting both > >> > > > >>>> approaches. I > >> > > > >>>>>>>>>> would > >> > > > >>>>>>>>>>>> also be fine with this. > >> > > > >>>>>>>>>>>> > >> > > > >>>>>>>>>>>> Best, > >> > > > >>>>>>>>>>>> David > >> > > > >>>>>>>>>>>> > >> > > > >>>>>>>>>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield < > >> > > > >>>>>>>>>>>> andrew_schofi...@live.com> > >> > > > >>>>>>>>>>>> wrote: > >> > > > >>>>>>>>>>>> > >> > > > >>>>>>>>>>>>> Hi TengYao, > >> > > > >>>>>>>>>>>>> Thanks for your response. I’ll have just one more > try > >> to > >> > > > >>>>> persuade. > >> > > > >>>>>>>>>>>>> I feel that I will need to follow the approach with > >> > KIP-932 > >> > > > >>>> when > >> > > > >>>>>>>>>> we’ve > >> > > > >>>>>>>>>>>>> made a decision, so I do have more than a passing > >> > interest > >> > > > >> in > >> > > > >>>>> this. > >> > > > >>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>> A group member in the lobby is in the group, but it > >> does > >> > > > >> not > >> > > > >>>> have > >> > > > >>>>>>>>>> any > >> > > > >>>>>>>>>>>>> assignments. A member of a consumer group can have > no > >> > > > >>> assigned > >> > > > >>>>>>>>>>>>> partitions (such as 5 CG members subscribed to a > topic > >> > > > >> with 4 > >> > > > >>>>>>>>>>>> partitions), > >> > > > >>>>>>>>>>>>> so it’s a situation that consumer group members > >> already > >> > > > >>> expect. > >> > > > >>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>> One of Kafka’s strengths is the way that we handle > API > >> > > > >>>>> versioning. > >> > > > >>>>>>>>>>>>> But, there is a cost - the behaviour is different > >> > depending > >> > > > >>> on > >> > > > >>>>> the > >> > > > >>>>>>>>>> RPC > >> > > > >>>>>>>>>>>>> version. KIP-848 is on the cusp of completion, but > >> we’re > >> > > > >>>> already > >> > > > >>>>>>>>>> adding > >> > > > >>>>>>>>>>>>> conditional logic for v0/v1 for > >> ConsumerGroupHeartbeat. > >> > > > >>> That’s > >> > > > >>>> a > >> > > > >>>>>>>>>> pity. > >> > > > >>>>>>>>>>>>> Only a minor issue, but it’s unfortunate. > >> > > > >>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>> Thanks, > >> > > > >>>>>>>>>>>>> Andrew > >> > > > >>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi < > >> > > > >> kiting...@gmail.com> > >> > > > >>>>>>>>>> wrote: > >> > > > >>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>> Hello Andrew > >> > > > >>>>>>>>>>>>>> Thank you for your thoughtful suggestions and > getting > >> > the > >> > > > >>>>>>>>>> discussion > >> > > > >>>>>>>>>>>>> going. > >> > > > >>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>> To AS1: > >> > > > >>>>>>>>>>>>>> In the current scenario where the server generates > >> the > >> > > > >> UUID, > >> > > > >>>> if > >> > > > >>>>>>>>>> the > >> > > > >>>>>>>>>>>>> client > >> > > > >>>>>>>>>>>>>> shuts down before receiving the memberId generated > by > >> > the > >> > > > >> GC > >> > > > >>>>>>>>>>>> (regardless > >> > > > >>>>>>>>>>>>> of > >> > > > >>>>>>>>>>>>>> whether it’s a graceful shutdown or not), the GC > will > >> > > > >> still > >> > > > >>>> have > >> > > > >>>>>>>>>> to > >> > > > >>>>>>>>>>>> wait > >> > > > >>>>>>>>>>>>>> for the heartbeat timeout because the client > doesn’t > >> > know > >> > > > >>> its > >> > > > >>>>>>>>>>> memberId. > >> > > > >>>>>>>>>>>>>> This KIP indeed cannot completely resolve the > >> > idempotency > >> > > > >>>> issue, > >> > > > >>>>>>>>>> but > >> > > > >>>>>>>>>>> it > >> > > > >>>>>>>>>>>>> can > >> > > > >>>>>>>>>>>>>> better handle shutdown scenarios under normal > >> > > > >> circumstances > >> > > > >>>>>>>>>> because > >> > > > >>>>>>>>>>> the > >> > > > >>>>>>>>>>>>>> client always knows its memberId. Even if the > client > >> > shuts > >> > > > >>>> down > >> > > > >>>>>>>>>>>>> immediately > >> > > > >>>>>>>>>>>>>> after the initial heartbeat, as long as it > performs a > >> > > > >>> graceful > >> > > > >>>>>>>>>>> shutdown > >> > > > >>>>>>>>>>>>> and > >> > > > >>>>>>>>>>>>>> sends a leave heartbeat, the GC can manage the > >> situation > >> > > > >> and > >> > > > >>>>>>>>>> remove > >> > > > >>>>>>>>>>> the > >> > > > >>>>>>>>>>>>>> member. Therefore, the goal of this KIP is to > address > >> > the > >> > > > >>>> issue > >> > > > >>>>>>>>>> where > >> > > > >>>>>>>>>>>> the > >> > > > >>>>>>>>>>>>>> GC has to wait for the heartbeat timeout due to the > >> > client > >> > > > >>>>> leaving > >> > > > >>>>>>>>>>>>> without > >> > > > >>>>>>>>>>>>>> knowing its memberId, which leads to reduced > >> throughput > >> > > > >> and > >> > > > >>>>>>>>>> limited > >> > > > >>>>>>>>>>>>>> scalability. > >> > > > >>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>> The solution you suggest has also been proposed by > >> > David. > >> > > > >>> The > >> > > > >>>>>>>>>> concern > >> > > > >>>>>>>>>>>>> with > >> > > > >>>>>>>>>>>>>> this approach is that it introduces additional > >> > complexity > >> > > > >>> for > >> > > > >>>>>>>>>>>>>> compatibility, as the new server would not > >> immediately > >> > add > >> > > > >>> the > >> > > > >>>>>>>>>> member > >> > > > >>>>>>>>>>>> to > >> > > > >>>>>>>>>>>>>> the group, while the old server would. This > requires > >> > > > >> clients > >> > > > >>>> to > >> > > > >>>>>>>>>>>>>> differentiate whether their memberId has been added > >> to > >> > the > >> > > > >>>> group > >> > > > >>>>>>>>>> or > >> > > > >>>>>>>>>>>> not, > >> > > > >>>>>>>>>>>>>> which could result in unexpected logs. > >> > > > >>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>> Best Regards, > >> > > > >>>>>>>>>>>>>> TengYao > >> > > > >>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 > >> > 2024年8月14日 > >> > > > >>> 週三 > >> > > > >>>>>>>>>>>> 上午12:29寫道: > >> > > > >>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>> Hi TengYao, > >> > > > >>>>>>>>>>>>>>> Thanks for the KIP. I wonder if there’s a > different > >> way > >> > > > >> to > >> > > > >>>>> close > >> > > > >>>>>>>>>>> what > >> > > > >>>>>>>>>>>>>>> is quite a small window. > >> > > > >>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>> AS1: It is true that the initial heartbeat is not > >> > > > >>> idempotent, > >> > > > >>>>> but > >> > > > >>>>>>>>>>> this > >> > > > >>>>>>>>>>>>>>> remains > >> > > > >>>>>>>>>>>>>>> true with this KIP. It’s just differently not > >> > idempotent. > >> > > > >>> If > >> > > > >>>>> the > >> > > > >>>>>>>>>>>> client > >> > > > >>>>>>>>>>>>>>> makes its > >> > > > >>>>>>>>>>>>>>> own member ID, sends a request and dies, the GC > will > >> > > > >> still > >> > > > >>>> have > >> > > > >>>>>>>>>>> added > >> > > > >>>>>>>>>>>>>>> the member to the group and it will hang around > >> until > >> > the > >> > > > >>>>> session > >> > > > >>>>>>>>>>>>> expires. > >> > > > >>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>> I wonder if the GC could still generate the member > >> ID > >> > in > >> > > > >>>>>>>>>> response to > >> > > > >>>>>>>>>>>> the > >> > > > >>>>>>>>>>>>>>> first > >> > > > >>>>>>>>>>>>>>> heartbeat, and put the member in a special PENDING > >> > state > >> > > > >>> with > >> > > > >>>>> no > >> > > > >>>>>>>>>>>>>>> assignments until the client sends the next > >> heartbeat, > >> > > > >> thus > >> > > > >>>>>>>>>>> confirming > >> > > > >>>>>>>>>>>>> it > >> > > > >>>>>>>>>>>>>>> has received the member ID. This would not be a > >> > protocol > >> > > > >>>> change > >> > > > >>>>>>>>>> at > >> > > > >>>>>>>>>>>> all, > >> > > > >>>>>>>>>>>>>>> just > >> > > > >>>>>>>>>>>>>>> a change to the GC to keep a new member in the > lobby > >> > > > >> until > >> > > > >>>> it’s > >> > > > >>>>>>>>>>>>> comfirmed > >> > > > >>>>>>>>>>>>>>> it knows its member ID. > >> > > > >>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>> Thanks, > >> > > > >>>>>>>>>>>>>>> Andrew > >> > > > >>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi < > >> > > > >>> kiting...@gmail.com> > >> > > > >>>>>>>>>> wrote: > >> > > > >>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>> Hi Chia-Ping, > >> > > > >>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>> Thanks for review and suggestions. > >> > > > >>>>>>>>>>>>>>>> I have updated the content of KIP accordingly. > >> > > > >>>>>>>>>>>>>>>> Please take a look. > >> > > > >>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>> Best regards, > >> > > > >>>>>>>>>>>>>>>> TengYao > >> > > > >>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 > 2024年8月13日 > >> 週二 > >> > > > >>>>> 下午9:45寫道: > >> > > > >>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>> hi TengYao > >> > > > >>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>> thanks for this KIP. > >> > > > >>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>> 1) could you please describe the before/after > >> > behavior > >> > > > >> in > >> > > > >>>> the > >> > > > >>>>>>>>>>>>> "Proposed > >> > > > >>>>>>>>>>>>>>>>> Changes" section? IIRC, current RPC allows HB > >> having > >> > > > >>> member > >> > > > >>>>> id > >> > > > >>>>>>>>>>>>>>> generated by > >> > > > >>>>>>>>>>>>>>>>> client, right? If HB has no member ID, server > will > >> > > > >>> generate > >> > > > >>>>> one > >> > > > >>>>>>>>>>> and > >> > > > >>>>>>>>>>>>> then > >> > > > >>>>>>>>>>>>>>>>> return. The new behavior will enforce HB "must" > >> have > >> > > > >>> member > >> > > > >>>>> ID. > >> > > > >>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>> 2) could you please write the version number > >> > explicitly > >> > > > >>> in > >> > > > >>>>> the > >> > > > >>>>>>>>>> KIP > >> > > > >>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>> 3) how new client code handle the old HB? Does > it > >> > > > >> always > >> > > > >>>>>>>>>> generate > >> > > > >>>>>>>>>>>>> member > >> > > > >>>>>>>>>>>>>>>>> ID on client-side even though that is not > >> restricted? > >> > > > >>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>> Best, > >> > > > >>>>>>>>>>>>>>>>> Chia-Ping > >> > > > >>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote: > >> > > > >>>>>>>>>>>>>>>>>> Hello everyone, > >> > > > >>>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>>> I would like to start a discussion thread on > >> > KIP-1082, > >> > > > >>>> which > >> > > > >>>>>>>>>>>> proposes > >> > > > >>>>>>>>>>>>>>>>>> enabling id generation for clients over the > >> > > > >>>>>>>>>>> ConsumerGroupHeartbeat > >> > > > >>>>>>>>>>>>> RPC. > >> > > > >>>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>>> Here is the KIP Link: KIP-1082 > >> > > > >>>>>>>>>>>>>>>>>> < > >> > > > >>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>> > >> > > > >>>>>>>>>>>> > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>> > >> > > > >>>>>>> > >> > > > >>>>> > >> > > > >>>> > >> > > > >>> > >> > > > >> > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC > >> > > > >>>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>>> Please take a look and let me know what you > >> think, > >> > > > >> and I > >> > > > >>>>> would > >> > > > >>>>>>>>>>>>>>> appreciate > >> > > > >>>>>>>>>>>>>>>>>> any suggestions and feedback. > >> > > > >>>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>>> Best regards, > >> > > > >>>>>>>>>>>>>>>>>> TengYao > >> > > > >>>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>> > >> > > > >>>>>>>>>>>>> > >> > > > >>>>>>>>>>>> > >> > > > >>>>>>>>>>> > >> > > > >>>>>>>>>> > >> > > > >>>>>>>>> > >> > > > >>>>>>> > >> > > > >>>>>>> > >> > > > >>>>> > >> > > > >>>> > >> > > > >>> > >> > > > >> > >> > > > > >> > > > > >> > > > >> > > >> > > >