Hi David, Thanks for your helpful suggestions, which make this KIP clearer.
I have revised the content according to your feedback. Regarding the fourth point, I agree that the original description was imprecise and could lead to misunderstandings. Since this is not new behavior, I have removed it. Please take a look and let me know what you think. Best regards, TengYao David Jacot <dja...@confluent.io.invalid> 於 2024年9月20日 週五 上午2:37寫道: > Hi, > > Thanks for the update. I have a few nits: > > > If the member ID is null or empty, the server will reject the request > with an InvalidRequestException. > We should clarify that this should only apply to version >= 1. > > > The consumer instance must generate a member ID, and this ID should > remain consistent for the duration of the consumer's session. Here, a > "session" is defined as the period from the consumer's first heartbeat > until it leaves the group, either through a graceful shutdown, a heartbeat > timeout, or the process stopping or dying. The consumer instance should > reuse the same member ID for all heartbeats and rejoin attempts to maintain > continuity within the group. > > This part is not clear to me. When the member leaves the group, it should > not reset the member id. I would rather say that the member must generate > its member id when it starts and it must keep it until the process stops. > It is basically an incarnation of the process. > > > If a conflict arises where the member ID generated by the client is > detected to be a duplicate within the same group (for example, the same > member ID is associated with another active member in the group), the > server will handle this by comparing the memberEpoch values of the > conflicting members. The member with the lower memberEpoch is considered > outdated and will be fenced off by the server. When this occurs, the server > responds with a FENCED_MEMBER_EPOCH error to the client, signaling it to > rejoin the group with the same member ID while resetting the memberEpoch to > zero. This ensures that the client properly resynchronizes and maintains > the continuity and consistency of the group membership. > > This part is not clear either. It basically says that if a member joins > with an existing member id but a different epoch, it will be fenced. Then > it must rejoin with the same member id and epoch zero. This is already the > current behavior and it does not help with detecting duplicates, right? > Should we just remove the paragraph? > > > A member ID mismatch occurs within a session: If the server detects a > mismatch between the provided member ID and the expected member ID for an > ongoing session, it should return a UNKNOWN_MEMBER_ID error. > > How could we detect a mismatch between the provided and the expected member > id? My understanding is that we can only know whether the provided member > id exists or not. This is already implemented. > > Thanks, > David > > On Sat, Sep 14, 2024 at 9:31 AM TengYao Chi <kiting...@gmail.com> wrote: > > > Hello everyone, > > > > Since this KIP has been fully discussed, I will initiate a vote for it > next > > Monday. > > Thank you and have a nice weekend. > > > > Best regards, > > TengYao > > > > TengYao Chi <kiting...@gmail.com> 於 2024年9月5日 週四 下午2:19寫道: > > > > > Hello everyone, > > > > > > KT2: It looks like everyone who has expressed an opinion supports the > > > second option: “Document a recommendation for clients to use UUIDs as > > > member IDs, without strictly enforcing it.” > > > I have updated the KIP accordingly. > > > Please take a look, and let me know if you have any thoughts or > feedback. > > > > > > Thank you! > > > > > > Best regards, > > > TengYao > > > > > > Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月30日 週五 下午9:56寫道: > > > > > >> hi TengYao > > >> > > >> KT2: +1 to second approach > > >> > > >> Best, > > >> Chia-Ping > > >> > > >> > > >> David Jacot <dja...@confluent.io.invalid> 於 2024年8月30日 週五 下午9:15寫道: > > >> > > >> > Hi TengYao, > > >> > > > >> > KT2: I don't think that we can realistically validate the uuid on > the > > >> > server. It is basically a string of chars. So I lean towards having > a > > >> good > > >> > recommendation in the KIP and in the document of the field in the > > RPC's > > >> > definition. > > >> > > > >> > Best, > > >> > David > > >> > > > >> > On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi <kiting...@gmail.com> > > >> wrote: > > >> > > > >> > > Hello Kirk ! > > >> > > > > >> > > Thank you for your comments ! > > >> > > > > >> > > KT1: Yes, you are correct. The issue is not unique to the initial > > >> > > heartbeat; there can always be cases where the broker might lose > > >> > connection > > >> > > with a member. > > >> > > > > >> > > KT2: Currently, if the client doesn't have a member ID and the > > >> > memberEpoch > > >> > > equals 0, the coordinator will generate a UUID as the member ID > for > > >> the > > >> > > client. However, at the RPC level, the member ID is sent as a > > literal > > >> > > string, meaning there are no restrictions on the format at this > > level. > > >> > > This also reminds me that we haven't reached a final conclusion on > > >> how to > > >> > > enforce the use of UUIDs. > > >> > > From our previous discussions, I recall two possible approaches: > > >> > > The first is to validate the UUID on the server side, and if it's > > not > > >> > > valid, throw an exception to the client. > > >> > > The second is to document a recommendation for clients to use > UUIDs > > as > > >> > > member IDs, without strictly enforcing it. > > >> > > I think it's time to decide on the approach we want to take. > > >> > > > > >> > > KT3: Yes, "session" can be considered synonymous with "membership" > > in > > >> > this > > >> > > context. > > >> > > > > >> > > KT4: Thank you for pointing that out. I will update the wording to > > >> > > specifically say this behavior is for consumers. > > >> > > > > >> > > Thanks again for your comments. > > >> > > > > >> > > Best regards, > > >> > > TengYao > > >> > > > > >> > > Kirk True <k...@kirktrue.pro> 於 2024年8月30日 週五 上午12:39寫道: > > >> > > > > >> > > > Hi TengYao! > > >> > > > > > >> > > > Sorry for being late to the discussion... > > >> > > > > > >> > > > After reading the thread and then the KIP, I had a few > > >> > > questions/comments: > > >> > > > > > >> > > > KT1: In Motivation, it states: "This scenario can result in the > > >> broker > > >> > > > registering a new member for which it will never receive a > proper > > >> leave > > >> > > > request.” Just to be clear, the broker will always have cases > > where > > >> it > > >> > > > might lose connection with a member. That’s not unique to the > > >> initial > > >> > > > heartbeat, right? > > >> > > > > > >> > > > KT2: There was a bit of back and forth about format of the > member > > >> ID. > > >> > > From > > >> > > > what I gathered in the thread, the member ID is still defined in > > the > > >> > RPC > > >> > > as > > >> > > > a string and not a UUID, right? The KIP states that the “client > > must > > >> > > > generate a UUID as the member ID” and that the “server will > > validate > > >> > > that a > > >> > > > valid UUID is provided.” Is that a change for the server, or is > it > > >> > > already > > >> > > > enforced as a UUID? > > >> > > > > > >> > > > KT3: Lianet mentioned some confusion over the use of the word > > >> > “session.” > > >> > > > Isn’t “session” synonymous with “membership?” > > >> > > > > > >> > > > KT4: Under “Member ID Lifecycle,” it states: "The client should > > >> reuse > > >> > the > > >> > > > same UUID as the member ID for all heartbeats and rejoin > attempts > > to > > >> > > > maintain continuity within the group.” Could we change the first > > >> part > > >> > of > > >> > > > that to “The Consumer instance should…” We do have lifetimes > that > > >> > extend > > >> > > > past the lifetime of a client instance (such as the transaction > > ID). > > >> > > > > > >> > > > Thanks, > > >> > > > Kirk > > >> > > > > > >> > > > > On Aug 29, 2024, at 1:28 AM, TengYao Chi <kiting...@gmail.com > > > > >> > wrote: > > >> > > > > > > >> > > > > Hi David, > > >> > > > > > > >> > > > > Thank you for pointing that out. > > >> > > > > I have updated the content of the KIP based on Lianet's and > your > > >> > > > feedback. > > >> > > > > Please take a look and let me know your thoughts. > > >> > > > > > > >> > > > > Best regards, > > >> > > > > TengYao > > >> > > > > > > >> > > > > David Jacot <dja...@confluent.io.invalid> 於 2024年8月29日 週四 > > >> 下午3:20寫道: > > >> > > > > > > >> > > > >> Hi TengYao, > > >> > > > >> > > >> > > > >> Thanks for the update. I haven't fully read it yet but I will > > >> soon. > > >> > > > >> > > >> > > > >> LM4: This is incorrect. The consumer must keep its member id > > >> during > > >> > > its > > >> > > > >> entire lifetime (until the process stops or dies). The > protocol > > >> > > > stipulates > > >> > > > >> that a member must rejoin with the same member id and the > > member > > >> > epoch > > >> > > > set > > >> > > > >> to zero when an FENCED_MEMBER_EPOCH occurs. This allows the > > >> member > > >> > to > > >> > > > >> resynchronize itself. We should not change this behavior. I > > think > > >> > that > > >> > > > we > > >> > > > >> should see the client side generation id as an incarnation id > > of > > >> the > > >> > > > >> application. It is generated once and kept until it stops or > > >> dies. > > >> > > > >> > > >> > > > >> Best, > > >> > > > >> David > > >> > > > >> > > >> > > > >> On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi < > > kiting...@gmail.com > > >> > > > >> > > > wrote: > > >> > > > >> > > >> > > > >>> Hello Lianet ! > > >> > > > >>> > > >> > > > >>> Thanks for the reviews and suggestions! > > >> > > > >>> > > >> > > > >>> LM1: Indeed, we plan to enforce client-side ID generation in > > the > > >> > > > future, > > >> > > > >>> and it is not an alternative. I will change the title > > >> accordingly. > > >> > > > >>> > > >> > > > >>> LM2: Yes, that's the expectation. I will add that statement > to > > >> the > > >> > > > public > > >> > > > >>> interface section. > > >> > > > >>> > > >> > > > >>> LM3: Thank you for the high-level perspective review. I > think > > >> > you're > > >> > > > >> right; > > >> > > > >>> our intention isn't very clear since it was placed at the > end > > of > > >> > the > > >> > > > >>> section. I will try to rephrase that section to make it more > > >> > obvious. > > >> > > > >>> > > >> > > > >>> LM4: Regarding the definition of "session" in this KIP, I > > >> believe > > >> > it > > >> > > > >> refers > > >> > > > >>> to the period between the *first-time heartbeat* and when > the > > >> > > *consumer > > >> > > > >>> leaves the group* (whether through a graceful shutdown or a > > >> > heartbeat > > >> > > > >>> timeout). The consumer should reuse its UUID if it has been > > >> > generated > > >> > > > >>> before. The only situation in which it will regenerate the > > UUID > > >> is > > >> > if > > >> > > > the > > >> > > > >>> coordinator finds that there is already a consumer with the > > same > > >> > > UUID. > > >> > > > >>> IIRC, the coordinator should compare the member epochs, and > > the > > >> > > > >>> later-joined consumer should be fenced off by the > coordinator > > >> due > > >> > to > > >> > > > >> having > > >> > > > >>> a lower member epoch. Once the consumer receives a > > >> > > > `FENCED_MEMBER_EPOCH` > > >> > > > >>> error, it will generate a new UUID and attempt to rejoin. I > > will > > >> > > > clarify > > >> > > > >>> this in the KIP. > > >> > > > >>> > > >> > > > >>> Thanks again for your reviews, I really appreciate it. > > >> > > > >>> > > >> > > > >>> Best regards, > > >> > > > >>> TengYao > > >> > > > >>> > > >> > > > >>> Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 下午7:12寫道: > > >> > > > >>> > > >> > > > >>>> Hello TengYao! Thanks for taking on this issue, we've been > > >> going > > >> > > > around > > >> > > > >>> it > > >> > > > >>>> for a while. > > >> > > > >>>> > > >> > > > >>>> LM1: About the title of the KIP: "Enable ID Generation for > > >> Clients > > >> > > > over > > >> > > > >>> the > > >> > > > >>>> ConsumerGroupHeartbeat RPC". I find it confusing because it > > >> hints > > >> > > that > > >> > > > >>>> we're adding it as an alternative (which was discussed and > > >> > > discarded, > > >> > > > >> in > > >> > > > >>>> favour of really enforcing it). It's also missing the core > > >> change > > >> > > imo, > > >> > > > >>>> which is "where" the generation happens. So, maybe more to > > the > > >> > point > > >> > > > >> with > > >> > > > >>>> something along the lines of "Client-side generated ID for > > >> clients > > >> > > > over > > >> > > > >>>> ConsumerGroupHeartbeat RPC"? > > >> > > > >>>> > > >> > > > >>>> LM2: On the public interfaces section, the KIP states that > > "the > > >> > > server > > >> > > > >>> will > > >> > > > >>>> reject the request", but we should agree on the specific > > error > > >> > > type. I > > >> > > > >>>> expect it should fail with an InvalidRequestException, is > > that > > >> the > > >> > > > >>>> intention? (This was also suggested in the discussion > thread > > >> > before > > >> > > > but > > >> > > > >>> is > > >> > > > >>>> not in the KIP). > > >> > > > >>>> > > >> > > > >>>> LM3. Related to my previous point, I find that to be the > true > > >> > > > >>> public-facing > > >> > > > >>>> change (member ID mandatory at the protocol level), but > it's > > >> only > > >> > at > > >> > > > >> the > > >> > > > >>>> end of the Public interfaces changes, kind of lost among > > >> details > > >> > of > > >> > > > how > > >> > > > >>>> we're going to do it. Should we rephrase that section with > > the > > >> > > actual > > >> > > > >>>> change first, and the hows after (ex. Bumping the version > is > > >> not > > >> > the > > >> > > > >>>> public-facing change in this case, it's just the mechanism > to > > >> > > properly > > >> > > > >>>> introduce our change) > > >> > > > >>>> > > >> > > > >>>> LM4. Regarding the lifetime of the UUID: the KIP states we > > will > > >> > > > "Verify > > >> > > > >>>> that the UUID remains consistent across all subsequent > > >> heartbeats > > >> > > > >> during > > >> > > > >>>> the session". What is this "session" referring to here? I > > would > > >> > > expect > > >> > > > >>> that > > >> > > > >>>> the UUID is associated to a consumer instance (generated > for > > >> the > > >> > > > >> consumer > > >> > > > >>>> the first time it needs to send a HB if it doesn't have the > > >> UUID > > >> > > yet. > > >> > > > >>> From > > >> > > > >>>> there on, every time it needs to send a "first HB" again, > it > > >> will > > >> > > > reuse > > >> > > > >>> its > > >> > > > >>>> UUID, is that the intention? Note that we should consider > > that > > >> the > > >> > > > same > > >> > > > >>>> consumer instance may have many "first heartbeats", meaning > > >> > > heartbeats > > >> > > > >> to > > >> > > > >>>> join the group when it's not part of it (ex. consumer > > >> unsubscribe > > >> > + > > >> > > > >>>> subscribe, fenced, stale). Is this the intention or are you > > >> > > > considering > > >> > > > >>> the > > >> > > > >>>> lifetime differently? We should clarify it in the KIP. > > >> > > > >>>> > > >> > > > >>>> Thanks! > > >> > > > >>>> > > >> > > > >>>> Lianet > > >> > > > >>>> > > >> > > > >>>> On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi < > > >> kiting...@gmail.com> > > >> > > > >> wrote: > > >> > > > >>>> > > >> > > > >>>>> Hi everyone, > > >> > > > >>>>> > > >> > > > >>>>> I have revised this KIP multiple times based on the > feedback > > >> from > > >> > > our > > >> > > > >>>>> discussions. > > >> > > > >>>>> I would greatly appreciate it if you could review it when > > you > > >> > have > > >> > > > >> the > > >> > > > >>>>> time. > > >> > > > >>>>> If there are no further comments or suggestions, I plan to > > >> > proceed > > >> > > > >> with > > >> > > > >>>>> initiating a vote soon. > > >> > > > >>>>> > > >> > > > >>>>> Best regards, > > >> > > > >>>>> TengYao > > >> > > > >>>>> > > >> > > > >>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 > 下午2:43寫道: > > >> > > > >>>>> > > >> > > > >>>>>> Hi Andrew, > > >> > > > >>>>>> Thank you for your previous feedback and insights. > > >> > > > >>>>>> Your contributions have added valuable perspectives to > the > > >> > > > >>> discussions. > > >> > > > >>>>>> And we also benefit from the comparison of different > > >> solutions. > > >> > > > >>>>>> I’m also looking forward to seeing an initial version in > > >> > KIP-932, > > >> > > > >> as > > >> > > > >>> it > > >> > > > >>>>>> will provide a good reference for future implementations. > > >> > > > >>>>>> > > >> > > > >>>>>> Regarding your comment on AS2, I wanted to clarify that > my > > >> > > > >>>> specification > > >> > > > >>>>>> references org.apache.kafka.common.Uuid. > > >> > > > >>>>>> I believe we’re referring to the same class, and it might > > >> just > > >> > be > > >> > > a > > >> > > > >>>> small > > >> > > > >>>>>> oversight due to the busy schedule. > > >> > > > >>>>>> > > >> > > > >>>>>> I want to express my gratitude once again for your many > > >> > insightful > > >> > > > >>>>>> comments, which have helped the discussion progress > > smoothly. > > >> > > > >>>>>> > > >> > > > >>>>>> Best regards, > > >> > > > >>>>>> TengYao > > >> > > > >>>>>> > > >> > > > >>>>>> > > >> > > > >>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 > 2024年8月22日 > > 週四 > > >> > > > >>>> 下午11:28寫道: > > >> > > > >>>>>> > > >> > > > >>>>>>> Hi TengYao, > > >> > > > >>>>>>> I’ve been reading through the comments and I’m happy > that > > >> the > > >> > > > >> lobby > > >> > > > >>>>>>> approach has not gained support. > > >> > > > >>>>>>> > > >> > > > >>>>>>> Assuming that this KIP is voted, I will be happy to > change > > >> > > KIP-932 > > >> > > > >>> so > > >> > > > >>>>>>> that it only supports client-side member ID generation. > > >> Because > > >> > > > >> that > > >> > > > >>>> KIP > > >> > > > >>>>>>> is still > > >> > > > >>>>>>> under development, I can do this in the first version of > > >> > > > >>>>>>> ShareGroupHeartbeat. > > >> > > > >>>>>>> > > >> > > > >>>>>>> AS2: For the encoding section, I suppose the specific > > >> encoding > > >> > > > >> which > > >> > > > >>>>>>> is used is what org.apache.kafka.utils.Uuid uses. > > >> > > > >>>>>>> > > >> > > > >>>>>>> Thanks, > > >> > > > >>>>>>> Andrew > > >> > > > >>>>>>> > > >> > > > >>>>>>>> On 14 Aug 2024, at 17:00, TengYao Chi < > > kiting...@gmail.com > > >> > > > >> > > > >>> wrote: > > >> > > > >>>>>>>> > > >> > > > >>>>>>>> Hello Apoorv, > > >> > > > >>>>>>>> Thank you for your feedback. > > >> > > > >>>>>>>> Regarding the questions you raised, unfortunately, this > > KIP > > >> > > > >> cannot > > >> > > > >>>>>>>> guarantee the order of heartbeats. As with many classic > > >> > > > >>> distributed > > >> > > > >>>>>>> system > > >> > > > >>>>>>>> challenges, what we can do is make our best effort to > > >> ensure > > >> > > > >> that > > >> > > > >>>>> there > > >> > > > >>>>>>> are > > >> > > > >>>>>>>> no idle members or stale assignments under normal > > >> > circumstances. > > >> > > > >>>>>>>> > > >> > > > >>>>>>>> As for the lobby approach, I’m not a fan of it because > it > > >> > > > >> requires > > >> > > > >>>>>>> adding a > > >> > > > >>>>>>>> mechanism to maintain client state within the > > >> ConsumerGroup, > > >> > > > >>> which, > > >> > > > >>>> in > > >> > > > >>>>>>> my > > >> > > > >>>>>>>> view, resembles something like a two-phase commit. This > > >> would > > >> > > > >>>>> introduce > > >> > > > >>>>>>>> more complexity than the proposal in this KIP, which is > > >> > > > >> something > > >> > > > >>> we > > >> > > > >>>>>>> want > > >> > > > >>>>>>>> to avoid. KIP-848 aims to simplify the existing > protocol, > > >> and > > >> > > > >>> while > > >> > > > >>>>> the > > >> > > > >>>>>>>> lobby approach is a good one, I believe it is not the > > right > > >> > fit > > >> > > > >>> for > > >> > > > >>>>> this > > >> > > > >>>>>>>> particular situation. > > >> > > > >>>>>>>> > > >> > > > >>>>>>>> Best regards, > > >> > > > >>>>>>>> TengYao > > >> > > > >>>>>>>> > > >> > > > >>>>>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 > > >> 下午11:45寫道: > > >> > > > >>>>>>>> > > >> > > > >>>>>>>>> Hi David, > > >> > > > >>>>>>>>> > > >> > > > >>>>>>>>> I really appreciate your review and suggestions. As I > am > > >> > still > > >> > > > >>>>> gaining > > >> > > > >>>>>>>>> experience in writing KIPs, your input has been > > incredibly > > >> > > > >>>> helpful. I > > >> > > > >>>>>>> am > > >> > > > >>>>>>>>> currently applying your suggestions to the KIP and > will > > >> > > > >> complete > > >> > > > >>> it > > >> > > > >>>>> as > > >> > > > >>>>>>> soon > > >> > > > >>>>>>>>> as possible. > > >> > > > >>>>>>>>> Regarding the UUID part, I think we haven’t reached a > > >> > > > >> conclusion > > >> > > > >>>>>>> yet.(So > > >> > > > >>>>>>>>> far according to this thread) > > >> > > > >>>>>>>>> However, I will review the current implementation in > the > > >> > Kafka > > >> > > > >>>> `Uuid` > > >> > > > >>>>>>>>> class and include a brief specification in the KIP. > > >> > > > >>>>>>>>> > > >> > > > >>>>>>>>> Once again, thank you so much for your help. > > >> > > > >>>>>>>>> > > >> > > > >>>>>>>>> Best regards, > > >> > > > >>>>>>>>> TengYao > > >> > > > >>>>>>>>> > > >> > > > >>>>>>>>> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 > > >> > 下午11:14寫道: > > >> > > > >>>>>>>>> > > >> > > > >>>>>>>>>> hi Apoorv > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>>>> As the memberId is now known to the client, and > client > > >> > might > > >> > > > >>> send > > >> > > > >>>>> the > > >> > > > >>>>>>>>>> leave > > >> > > > >>>>>>>>>> group heartbeat on shutdown prior to receiving the > > >> initial > > >> > > > >>>> heartbeat > > >> > > > >>>>>>>>>> response. If that's true then how do we guarantee > that > > >> the 2 > > >> > > > >>>>> requests > > >> > > > >>>>>>> to > > >> > > > >>>>>>>>>> join and leave will be processed in order, which > could > > >> still > > >> > > > >>> leave > > >> > > > >>>>>>> stale > > >> > > > >>>>>>>>>> members or throw unknown member id exceptions? > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>>> This is definitely a good question. the short answer: > > no > > >> > > > >>> guarantee > > >> > > > >>>>> but > > >> > > > >>>>>>>>>> best > > >> > > > >>>>>>>>>> efforts > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>>> Please notice the root cause is "we have no enough > time > > >> to > > >> > > > >> wait > > >> > > > >>>>>>> member id > > >> > > > >>>>>>>>>> (response) when closing consumer". Sadly, we can' > > >> guarantee > > >> > > > >> the > > >> > > > >>>>>>> request > > >> > > > >>>>>>>>>> order due to the same reason. > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>>> However, in contrast to previous behavior, there is > one > > >> big > > >> > > > >>>> benefit > > >> > > > >>>>>>> of new > > >> > > > >>>>>>>>>> approach - we can try STONITH because we know the > > member > > >> id > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>>> Best, > > >> > > > >>>>>>>>>> Chia-Ping > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 > 2024年8月14日 > > 週三 > > >> > > > >>>> 下午8:55寫道: > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>>>> Hi TengYao, > > >> > > > >>>>>>>>>>> Thanks for the KIP. Continuing on the point which > > Andrew > > >> > > > >>>> mentioned > > >> > > > >>>>> as > > >> > > > >>>>>>>>>> AS1. > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>>> As the memberId is now known to the client, and > client > > >> > might > > >> > > > >>> send > > >> > > > >>>>> the > > >> > > > >>>>>>>>>> leave > > >> > > > >>>>>>>>>>> group heartbeat on shutdown prior to receiving the > > >> initial > > >> > > > >>>>> heartbeat > > >> > > > >>>>>>>>>>> response. If that's true then how do we guarantee > that > > >> the > > >> > 2 > > >> > > > >>>>>>> requests to > > >> > > > >>>>>>>>>>> join and leave will be processed in order, which > could > > >> > still > > >> > > > >>>> leave > > >> > > > >>>>>>> stale > > >> > > > >>>>>>>>>>> members or throw unknown member id exceptions? > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>>> Though the client side member id generation is > helpful > > >> > which > > >> > > > >>> will > > >> > > > >>>>>>>>>> represent > > >> > > > >>>>>>>>>>> the same group perspective as from client and > broker's > > >> end. > > >> > > > >>> But I > > >> > > > >>>>>>> think > > >> > > > >>>>>>>>>> the > > >> > > > >>>>>>>>>>> major concern we want to solve here is Stale > Partition > > >> > > > >>>> Assignments > > >> > > > >>>>>>> which > > >> > > > >>>>>>>>>>> might still exist with the new approach. I am > leaning > > >> > towards > > >> > > > >>> the > > >> > > > >>>>>>>>>>> suggestion mentioned by Andrew where partition > > >> assignment > > >> > > > >>>> triggers > > >> > > > >>>>> on > > >> > > > >>>>>>>>>>> subsequent heartbeat when client acknowledges the > > >> initial > > >> > > > >>>>> heartbeat, > > >> > > > >>>>>>>>>>> delayed partition assignment. > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>>> Though on a separate note, I have a different > > question. > > >> > What > > >> > > > >>>>> happens > > >> > > > >>>>>>>>>> when > > >> > > > >>>>>>>>>>> there is an issue with the client which sends the > > >> initial > > >> > > > >>>> heartbeat > > >> > > > >>>>>>>>>> without > > >> > > > >>>>>>>>>>> memberId, then crashes and restarts? I think we must > > be > > >> > > > >>>>> re-triggering > > >> > > > >>>>>>>>>>> assignments and expiring members only after the > > >> heartbeat > > >> > > > >>> session > > >> > > > >>>>>>>>>> timeout? > > >> > > > >>>>>>>>>>> If that's true then shall delayed partition > assignment > > >> can > > >> > > > >> help > > >> > > > >>>>>>> benefit > > >> > > > >>>>>>>>>> us > > >> > > > >>>>>>>>>>> from this situation as well? > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>>> Regards, > > >> > > > >>>>>>>>>>> Apoorv Mittal > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot > > >> > > > >>>>>>>>>> <dja...@confluent.io.invalid> > > >> > > > >>>>>>>>>>> wrote: > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>>>> Hi Andrew, > > >> > > > >>>>>>>>>>>> > > >> > > > >>>>>>>>>>>> Personally, I don't like the lobby approach. It > makes > > >> > things > > >> > > > >>>> more > > >> > > > >>>>>>>>>>>> complicated and it would require changing the > records > > >> on > > >> > the > > >> > > > >>>>> server > > >> > > > >>>>>>>>>> too. > > >> > > > >>>>>>>>>>>> This is why I initially suggested the rejected > > >> alternative > > >> > > > >> #2 > > >> > > > >>>>> which > > >> > > > >>>>>>> is > > >> > > > >>>>>>>>>>>> pretty close but also not perfect. > > >> > > > >>>>>>>>>>>> > > >> > > > >>>>>>>>>>>> I'd like to clarify one thing. The > > >> ConsumerGroupHeartbeat > > >> > > > >> API > > >> > > > >>>>>>> already > > >> > > > >>>>>>>>>>>> supports generating the member id on the client so > we > > >> > don't > > >> > > > >>> need > > >> > > > >>>>> any > > >> > > > >>>>>>>>>>>> conditional logic on the client side. This is > > actually > > >> > what > > >> > > > >> we > > >> > > > >>>>>>> wanted > > >> > > > >>>>>>>>>> to > > >> > > > >>>>>>>>>>> do > > >> > > > >>>>>>>>>>>> in the first place but the idea got pushed back by > > >> Magnus > > >> > > > >> back > > >> > > > >>>>> then > > >> > > > >>>>>>>>>>> because > > >> > > > >>>>>>>>>>>> generating uuid from librdkafka required a new > > >> dependency. > > >> > > > >> It > > >> > > > >>>>> turns > > >> > > > >>>>>>>>>> out > > >> > > > >>>>>>>>>>>> that librdkafka has that dependency today. In > > >> retrospect, > > >> > we > > >> > > > >>>>> should > > >> > > > >>>>>>>>>> have > > >> > > > >>>>>>>>>>>> pushed back on this. Long story short, we can just > do > > >> it. > > >> > > > >> The > > >> > > > >>>>>>>>>> proposal in > > >> > > > >>>>>>>>>>>> this KIP is to make the member id required in > future > > >> > > > >> versions. > > >> > > > >>>> We > > >> > > > >>>>>>>>>> could > > >> > > > >>>>>>>>>>>> also decide not to do it and to keep supporting > both > > >> > > > >>>> approaches. I > > >> > > > >>>>>>>>>> would > > >> > > > >>>>>>>>>>>> also be fine with this. > > >> > > > >>>>>>>>>>>> > > >> > > > >>>>>>>>>>>> Best, > > >> > > > >>>>>>>>>>>> David > > >> > > > >>>>>>>>>>>> > > >> > > > >>>>>>>>>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield < > > >> > > > >>>>>>>>>>>> andrew_schofi...@live.com> > > >> > > > >>>>>>>>>>>> wrote: > > >> > > > >>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>> Hi TengYao, > > >> > > > >>>>>>>>>>>>> Thanks for your response. I’ll have just one more > > try > > >> to > > >> > > > >>>>> persuade. > > >> > > > >>>>>>>>>>>>> I feel that I will need to follow the approach > with > > >> > KIP-932 > > >> > > > >>>> when > > >> > > > >>>>>>>>>> we’ve > > >> > > > >>>>>>>>>>>>> made a decision, so I do have more than a passing > > >> > interest > > >> > > > >> in > > >> > > > >>>>> this. > > >> > > > >>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>> A group member in the lobby is in the group, but > it > > >> does > > >> > > > >> not > > >> > > > >>>> have > > >> > > > >>>>>>>>>> any > > >> > > > >>>>>>>>>>>>> assignments. A member of a consumer group can have > > no > > >> > > > >>> assigned > > >> > > > >>>>>>>>>>>>> partitions (such as 5 CG members subscribed to a > > topic > > >> > > > >> with 4 > > >> > > > >>>>>>>>>>>> partitions), > > >> > > > >>>>>>>>>>>>> so it’s a situation that consumer group members > > >> already > > >> > > > >>> expect. > > >> > > > >>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>> One of Kafka’s strengths is the way that we handle > > API > > >> > > > >>>>> versioning. > > >> > > > >>>>>>>>>>>>> But, there is a cost - the behaviour is different > > >> > depending > > >> > > > >>> on > > >> > > > >>>>> the > > >> > > > >>>>>>>>>> RPC > > >> > > > >>>>>>>>>>>>> version. KIP-848 is on the cusp of completion, but > > >> we’re > > >> > > > >>>> already > > >> > > > >>>>>>>>>> adding > > >> > > > >>>>>>>>>>>>> conditional logic for v0/v1 for > > >> ConsumerGroupHeartbeat. > > >> > > > >>> That’s > > >> > > > >>>> a > > >> > > > >>>>>>>>>> pity. > > >> > > > >>>>>>>>>>>>> Only a minor issue, but it’s unfortunate. > > >> > > > >>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>> Thanks, > > >> > > > >>>>>>>>>>>>> Andrew > > >> > > > >>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi < > > >> > > > >> kiting...@gmail.com> > > >> > > > >>>>>>>>>> wrote: > > >> > > > >>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>> Hello Andrew > > >> > > > >>>>>>>>>>>>>> Thank you for your thoughtful suggestions and > > getting > > >> > the > > >> > > > >>>>>>>>>> discussion > > >> > > > >>>>>>>>>>>>> going. > > >> > > > >>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>> To AS1: > > >> > > > >>>>>>>>>>>>>> In the current scenario where the server > generates > > >> the > > >> > > > >> UUID, > > >> > > > >>>> if > > >> > > > >>>>>>>>>> the > > >> > > > >>>>>>>>>>>>> client > > >> > > > >>>>>>>>>>>>>> shuts down before receiving the memberId > generated > > by > > >> > the > > >> > > > >> GC > > >> > > > >>>>>>>>>>>> (regardless > > >> > > > >>>>>>>>>>>>> of > > >> > > > >>>>>>>>>>>>>> whether it’s a graceful shutdown or not), the GC > > will > > >> > > > >> still > > >> > > > >>>> have > > >> > > > >>>>>>>>>> to > > >> > > > >>>>>>>>>>>> wait > > >> > > > >>>>>>>>>>>>>> for the heartbeat timeout because the client > > doesn’t > > >> > know > > >> > > > >>> its > > >> > > > >>>>>>>>>>> memberId. > > >> > > > >>>>>>>>>>>>>> This KIP indeed cannot completely resolve the > > >> > idempotency > > >> > > > >>>> issue, > > >> > > > >>>>>>>>>> but > > >> > > > >>>>>>>>>>> it > > >> > > > >>>>>>>>>>>>> can > > >> > > > >>>>>>>>>>>>>> better handle shutdown scenarios under normal > > >> > > > >> circumstances > > >> > > > >>>>>>>>>> because > > >> > > > >>>>>>>>>>> the > > >> > > > >>>>>>>>>>>>>> client always knows its memberId. Even if the > > client > > >> > shuts > > >> > > > >>>> down > > >> > > > >>>>>>>>>>>>> immediately > > >> > > > >>>>>>>>>>>>>> after the initial heartbeat, as long as it > > performs a > > >> > > > >>> graceful > > >> > > > >>>>>>>>>>> shutdown > > >> > > > >>>>>>>>>>>>> and > > >> > > > >>>>>>>>>>>>>> sends a leave heartbeat, the GC can manage the > > >> situation > > >> > > > >> and > > >> > > > >>>>>>>>>> remove > > >> > > > >>>>>>>>>>> the > > >> > > > >>>>>>>>>>>>>> member. Therefore, the goal of this KIP is to > > address > > >> > the > > >> > > > >>>> issue > > >> > > > >>>>>>>>>> where > > >> > > > >>>>>>>>>>>> the > > >> > > > >>>>>>>>>>>>>> GC has to wait for the heartbeat timeout due to > the > > >> > client > > >> > > > >>>>> leaving > > >> > > > >>>>>>>>>>>>> without > > >> > > > >>>>>>>>>>>>>> knowing its memberId, which leads to reduced > > >> throughput > > >> > > > >> and > > >> > > > >>>>>>>>>> limited > > >> > > > >>>>>>>>>>>>>> scalability. > > >> > > > >>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>> The solution you suggest has also been proposed > by > > >> > David. > > >> > > > >>> The > > >> > > > >>>>>>>>>> concern > > >> > > > >>>>>>>>>>>>> with > > >> > > > >>>>>>>>>>>>>> this approach is that it introduces additional > > >> > complexity > > >> > > > >>> for > > >> > > > >>>>>>>>>>>>>> compatibility, as the new server would not > > >> immediately > > >> > add > > >> > > > >>> the > > >> > > > >>>>>>>>>> member > > >> > > > >>>>>>>>>>>> to > > >> > > > >>>>>>>>>>>>>> the group, while the old server would. This > > requires > > >> > > > >> clients > > >> > > > >>>> to > > >> > > > >>>>>>>>>>>>>> differentiate whether their memberId has been > added > > >> to > > >> > the > > >> > > > >>>> group > > >> > > > >>>>>>>>>> or > > >> > > > >>>>>>>>>>>> not, > > >> > > > >>>>>>>>>>>>>> which could result in unexpected logs. > > >> > > > >>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>> Best Regards, > > >> > > > >>>>>>>>>>>>>> TengYao > > >> > > > >>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 > > >> > 2024年8月14日 > > >> > > > >>> 週三 > > >> > > > >>>>>>>>>>>> 上午12:29寫道: > > >> > > > >>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>> Hi TengYao, > > >> > > > >>>>>>>>>>>>>>> Thanks for the KIP. I wonder if there’s a > > different > > >> way > > >> > > > >> to > > >> > > > >>>>> close > > >> > > > >>>>>>>>>>> what > > >> > > > >>>>>>>>>>>>>>> is quite a small window. > > >> > > > >>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>> AS1: It is true that the initial heartbeat is > not > > >> > > > >>> idempotent, > > >> > > > >>>>> but > > >> > > > >>>>>>>>>>> this > > >> > > > >>>>>>>>>>>>>>> remains > > >> > > > >>>>>>>>>>>>>>> true with this KIP. It’s just differently not > > >> > idempotent. > > >> > > > >>> If > > >> > > > >>>>> the > > >> > > > >>>>>>>>>>>> client > > >> > > > >>>>>>>>>>>>>>> makes its > > >> > > > >>>>>>>>>>>>>>> own member ID, sends a request and dies, the GC > > will > > >> > > > >> still > > >> > > > >>>> have > > >> > > > >>>>>>>>>>> added > > >> > > > >>>>>>>>>>>>>>> the member to the group and it will hang around > > >> until > > >> > the > > >> > > > >>>>> session > > >> > > > >>>>>>>>>>>>> expires. > > >> > > > >>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>> I wonder if the GC could still generate the > member > > >> ID > > >> > in > > >> > > > >>>>>>>>>> response to > > >> > > > >>>>>>>>>>>> the > > >> > > > >>>>>>>>>>>>>>> first > > >> > > > >>>>>>>>>>>>>>> heartbeat, and put the member in a special > PENDING > > >> > state > > >> > > > >>> with > > >> > > > >>>>> no > > >> > > > >>>>>>>>>>>>>>> assignments until the client sends the next > > >> heartbeat, > > >> > > > >> thus > > >> > > > >>>>>>>>>>> confirming > > >> > > > >>>>>>>>>>>>> it > > >> > > > >>>>>>>>>>>>>>> has received the member ID. This would not be a > > >> > protocol > > >> > > > >>>> change > > >> > > > >>>>>>>>>> at > > >> > > > >>>>>>>>>>>> all, > > >> > > > >>>>>>>>>>>>>>> just > > >> > > > >>>>>>>>>>>>>>> a change to the GC to keep a new member in the > > lobby > > >> > > > >> until > > >> > > > >>>> it’s > > >> > > > >>>>>>>>>>>>> comfirmed > > >> > > > >>>>>>>>>>>>>>> it knows its member ID. > > >> > > > >>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>> Thanks, > > >> > > > >>>>>>>>>>>>>>> Andrew > > >> > > > >>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi < > > >> > > > >>> kiting...@gmail.com> > > >> > > > >>>>>>>>>> wrote: > > >> > > > >>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>> Hi Chia-Ping, > > >> > > > >>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>> Thanks for review and suggestions. > > >> > > > >>>>>>>>>>>>>>>> I have updated the content of KIP accordingly. > > >> > > > >>>>>>>>>>>>>>>> Please take a look. > > >> > > > >>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>> Best regards, > > >> > > > >>>>>>>>>>>>>>>> TengYao > > >> > > > >>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 > > 2024年8月13日 > > >> 週二 > > >> > > > >>>>> 下午9:45寫道: > > >> > > > >>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>> hi TengYao > > >> > > > >>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>> thanks for this KIP. > > >> > > > >>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>> 1) could you please describe the before/after > > >> > behavior > > >> > > > >> in > > >> > > > >>>> the > > >> > > > >>>>>>>>>>>>> "Proposed > > >> > > > >>>>>>>>>>>>>>>>> Changes" section? IIRC, current RPC allows HB > > >> having > > >> > > > >>> member > > >> > > > >>>>> id > > >> > > > >>>>>>>>>>>>>>> generated by > > >> > > > >>>>>>>>>>>>>>>>> client, right? If HB has no member ID, server > > will > > >> > > > >>> generate > > >> > > > >>>>> one > > >> > > > >>>>>>>>>>> and > > >> > > > >>>>>>>>>>>>> then > > >> > > > >>>>>>>>>>>>>>>>> return. The new behavior will enforce HB > "must" > > >> have > > >> > > > >>> member > > >> > > > >>>>> ID. > > >> > > > >>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>> 2) could you please write the version number > > >> > explicitly > > >> > > > >>> in > > >> > > > >>>>> the > > >> > > > >>>>>>>>>> KIP > > >> > > > >>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>> 3) how new client code handle the old HB? Does > > it > > >> > > > >> always > > >> > > > >>>>>>>>>> generate > > >> > > > >>>>>>>>>>>>> member > > >> > > > >>>>>>>>>>>>>>>>> ID on client-side even though that is not > > >> restricted? > > >> > > > >>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>> Best, > > >> > > > >>>>>>>>>>>>>>>>> Chia-Ping > > >> > > > >>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote: > > >> > > > >>>>>>>>>>>>>>>>>> Hello everyone, > > >> > > > >>>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>>> I would like to start a discussion thread on > > >> > KIP-1082, > > >> > > > >>>> which > > >> > > > >>>>>>>>>>>> proposes > > >> > > > >>>>>>>>>>>>>>>>>> enabling id generation for clients over the > > >> > > > >>>>>>>>>>> ConsumerGroupHeartbeat > > >> > > > >>>>>>>>>>>>> RPC. > > >> > > > >>>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>>> Here is the KIP Link: KIP-1082 > > >> > > > >>>>>>>>>>>>>>>>>> < > > >> > > > >>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>> > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>> > > >> > > > >>>>> > > >> > > > >>>> > > >> > > > >>> > > >> > > > >> > > >> > > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC > > >> > > > >>>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>>> Please take a look and let me know what you > > >> think, > > >> > > > >> and I > > >> > > > >>>>> would > > >> > > > >>>>>>>>>>>>>>> appreciate > > >> > > > >>>>>>>>>>>>>>>>>> any suggestions and feedback. > > >> > > > >>>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>>> Best regards, > > >> > > > >>>>>>>>>>>>>>>>>> TengYao > > >> > > > >>>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>>> > > >> > > > >>>>>>>>>>>> > > >> > > > >>>>>>>>>>> > > >> > > > >>>>>>>>>> > > >> > > > >>>>>>>>> > > >> > > > >>>>>>> > > >> > > > >>>>>>> > > >> > > > >>>>> > > >> > > > >>>> > > >> > > > >>> > > >> > > > >> > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >