Hi David, Thank you for pointing that out. I have updated the content of the KIP based on Lianet's and your feedback. Please take a look and let me know your thoughts.
Best regards, TengYao David Jacot <dja...@confluent.io.invalid> 於 2024年8月29日 週四 下午3:20寫道: > Hi TengYao, > > Thanks for the update. I haven't fully read it yet but I will soon. > > LM4: This is incorrect. The consumer must keep its member id during its > entire lifetime (until the process stops or dies). The protocol stipulates > that a member must rejoin with the same member id and the member epoch set > to zero when an FENCED_MEMBER_EPOCH occurs. This allows the member to > resynchronize itself. We should not change this behavior. I think that we > should see the client side generation id as an incarnation id of the > application. It is generated once and kept until it stops or dies. > > Best, > David > > On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi <kiting...@gmail.com> wrote: > > > Hello Lianet ! > > > > Thanks for the reviews and suggestions! > > > > LM1: Indeed, we plan to enforce client-side ID generation in the future, > > and it is not an alternative. I will change the title accordingly. > > > > LM2: Yes, that's the expectation. I will add that statement to the public > > interface section. > > > > LM3: Thank you for the high-level perspective review. I think you're > right; > > our intention isn't very clear since it was placed at the end of the > > section. I will try to rephrase that section to make it more obvious. > > > > LM4: Regarding the definition of "session" in this KIP, I believe it > refers > > to the period between the *first-time heartbeat* and when the *consumer > > leaves the group* (whether through a graceful shutdown or a heartbeat > > timeout). The consumer should reuse its UUID if it has been generated > > before. The only situation in which it will regenerate the UUID is if the > > coordinator finds that there is already a consumer with the same UUID. > > IIRC, the coordinator should compare the member epochs, and the > > later-joined consumer should be fenced off by the coordinator due to > having > > a lower member epoch. Once the consumer receives a `FENCED_MEMBER_EPOCH` > > error, it will generate a new UUID and attempt to rejoin. I will clarify > > this in the KIP. > > > > Thanks again for your reviews, I really appreciate it. > > > > Best regards, > > TengYao > > > > Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 下午7:12寫道: > > > > > Hello TengYao! Thanks for taking on this issue, we've been going around > > it > > > for a while. > > > > > > LM1: About the title of the KIP: "Enable ID Generation for Clients over > > the > > > ConsumerGroupHeartbeat RPC". I find it confusing because it hints that > > > we're adding it as an alternative (which was discussed and discarded, > in > > > favour of really enforcing it). It's also missing the core change imo, > > > which is "where" the generation happens. So, maybe more to the point > with > > > something along the lines of "Client-side generated ID for clients over > > > ConsumerGroupHeartbeat RPC"? > > > > > > LM2: On the public interfaces section, the KIP states that "the server > > will > > > reject the request", but we should agree on the specific error type. I > > > expect it should fail with an InvalidRequestException, is that the > > > intention? (This was also suggested in the discussion thread before but > > is > > > not in the KIP). > > > > > > LM3. Related to my previous point, I find that to be the true > > public-facing > > > change (member ID mandatory at the protocol level), but it's only at > the > > > end of the Public interfaces changes, kind of lost among details of how > > > we're going to do it. Should we rephrase that section with the actual > > > change first, and the hows after (ex. Bumping the version is not the > > > public-facing change in this case, it's just the mechanism to properly > > > introduce our change) > > > > > > LM4. Regarding the lifetime of the UUID: the KIP states we will "Verify > > > that the UUID remains consistent across all subsequent heartbeats > during > > > the session". What is this "session" referring to here? I would expect > > that > > > the UUID is associated to a consumer instance (generated for the > consumer > > > the first time it needs to send a HB if it doesn't have the UUID yet. > > From > > > there on, every time it needs to send a "first HB" again, it will reuse > > its > > > UUID, is that the intention? Note that we should consider that the same > > > consumer instance may have many "first heartbeats", meaning heartbeats > to > > > join the group when it's not part of it (ex. consumer unsubscribe + > > > subscribe, fenced, stale). Is this the intention or are you considering > > the > > > lifetime differently? We should clarify it in the KIP. > > > > > > Thanks! > > > > > > Lianet > > > > > > On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi <kiting...@gmail.com> > wrote: > > > > > > > Hi everyone, > > > > > > > > I have revised this KIP multiple times based on the feedback from our > > > > discussions. > > > > I would greatly appreciate it if you could review it when you have > the > > > > time. > > > > If there are no further comments or suggestions, I plan to proceed > with > > > > initiating a vote soon. > > > > > > > > Best regards, > > > > TengYao > > > > > > > > TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道: > > > > > > > > > Hi Andrew, > > > > > Thank you for your previous feedback and insights. > > > > > Your contributions have added valuable perspectives to the > > discussions. > > > > > And we also benefit from the comparison of different solutions. > > > > > I’m also looking forward to seeing an initial version in KIP-932, > as > > it > > > > > will provide a good reference for future implementations. > > > > > > > > > > Regarding your comment on AS2, I wanted to clarify that my > > > specification > > > > > references org.apache.kafka.common.Uuid. > > > > > I believe we’re referring to the same class, and it might just be a > > > small > > > > > oversight due to the busy schedule. > > > > > > > > > > I want to express my gratitude once again for your many insightful > > > > > comments, which have helped the discussion progress smoothly. > > > > > > > > > > Best regards, > > > > > TengYao > > > > > > > > > > > > > > > Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 週四 > > > 下午11:28寫道: > > > > > > > > > >> Hi TengYao, > > > > >> I’ve been reading through the comments and I’m happy that the > lobby > > > > >> approach has not gained support. > > > > >> > > > > >> Assuming that this KIP is voted, I will be happy to change KIP-932 > > so > > > > >> that it only supports client-side member ID generation. Because > that > > > KIP > > > > >> is still > > > > >> under development, I can do this in the first version of > > > > >> ShareGroupHeartbeat. > > > > >> > > > > >> AS2: For the encoding section, I suppose the specific encoding > which > > > > >> is used is what org.apache.kafka.utils.Uuid uses. > > > > >> > > > > >> Thanks, > > > > >> Andrew > > > > >> > > > > >> > On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com> > > wrote: > > > > >> > > > > > >> > Hello Apoorv, > > > > >> > Thank you for your feedback. > > > > >> > Regarding the questions you raised, unfortunately, this KIP > cannot > > > > >> > guarantee the order of heartbeats. As with many classic > > distributed > > > > >> system > > > > >> > challenges, what we can do is make our best effort to ensure > that > > > > there > > > > >> are > > > > >> > no idle members or stale assignments under normal circumstances. > > > > >> > > > > > >> > As for the lobby approach, I’m not a fan of it because it > requires > > > > >> adding a > > > > >> > mechanism to maintain client state within the ConsumerGroup, > > which, > > > in > > > > >> my > > > > >> > view, resembles something like a two-phase commit. This would > > > > introduce > > > > >> > more complexity than the proposal in this KIP, which is > something > > we > > > > >> want > > > > >> > to avoid. KIP-848 aims to simplify the existing protocol, and > > while > > > > the > > > > >> > lobby approach is a good one, I believe it is not the right fit > > for > > > > this > > > > >> > particular situation. > > > > >> > > > > > >> > Best regards, > > > > >> > TengYao > > > > >> > > > > > >> > TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道: > > > > >> > > > > > >> >> Hi David, > > > > >> >> > > > > >> >> I really appreciate your review and suggestions. As I am still > > > > gaining > > > > >> >> experience in writing KIPs, your input has been incredibly > > > helpful. I > > > > >> am > > > > >> >> currently applying your suggestions to the KIP and will > complete > > it > > > > as > > > > >> soon > > > > >> >> as possible. > > > > >> >> Regarding the UUID part, I think we haven’t reached a > conclusion > > > > >> yet.(So > > > > >> >> far according to this thread) > > > > >> >> However, I will review the current implementation in the Kafka > > > `Uuid` > > > > >> >> class and include a brief specification in the KIP. > > > > >> >> > > > > >> >> Once again, thank you so much for your help. > > > > >> >> > > > > >> >> Best regards, > > > > >> >> TengYao > > > > >> >> > > > > >> >> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 下午11:14寫道: > > > > >> >> > > > > >> >>> hi Apoorv > > > > >> >>> > > > > >> >>>> As the memberId is now known to the client, and client might > > send > > > > the > > > > >> >>> leave > > > > >> >>> group heartbeat on shutdown prior to receiving the initial > > > heartbeat > > > > >> >>> response. If that's true then how do we guarantee that the 2 > > > > requests > > > > >> to > > > > >> >>> join and leave will be processed in order, which could still > > leave > > > > >> stale > > > > >> >>> members or throw unknown member id exceptions? > > > > >> >>> > > > > >> >>> This is definitely a good question. the short answer: no > > guarantee > > > > but > > > > >> >>> best > > > > >> >>> efforts > > > > >> >>> > > > > >> >>> Please notice the root cause is "we have no enough time to > wait > > > > >> member id > > > > >> >>> (response) when closing consumer". Sadly, we can' guarantee > the > > > > >> request > > > > >> >>> order due to the same reason. > > > > >> >>> > > > > >> >>> However, in contrast to previous behavior, there is one big > > > benefit > > > > >> of new > > > > >> >>> approach - we can try STONITH because we know the member id > > > > >> >>> > > > > >> >>> Best, > > > > >> >>> Chia-Ping > > > > >> >>> > > > > >> >>> > > > > >> >>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三 > > > 下午8:55寫道: > > > > >> >>> > > > > >> >>>> Hi TengYao, > > > > >> >>>> Thanks for the KIP. Continuing on the point which Andrew > > > mentioned > > > > as > > > > >> >>> AS1. > > > > >> >>>> > > > > >> >>>> As the memberId is now known to the client, and client might > > send > > > > the > > > > >> >>> leave > > > > >> >>>> group heartbeat on shutdown prior to receiving the initial > > > > heartbeat > > > > >> >>>> response. If that's true then how do we guarantee that the 2 > > > > >> requests to > > > > >> >>>> join and leave will be processed in order, which could still > > > leave > > > > >> stale > > > > >> >>>> members or throw unknown member id exceptions? > > > > >> >>>> > > > > >> >>>> Though the client side member id generation is helpful which > > will > > > > >> >>> represent > > > > >> >>>> the same group perspective as from client and broker's end. > > But I > > > > >> think > > > > >> >>> the > > > > >> >>>> major concern we want to solve here is Stale Partition > > > Assignments > > > > >> which > > > > >> >>>> might still exist with the new approach. I am leaning towards > > the > > > > >> >>>> suggestion mentioned by Andrew where partition assignment > > > triggers > > > > on > > > > >> >>>> subsequent heartbeat when client acknowledges the initial > > > > heartbeat, > > > > >> >>>> delayed partition assignment. > > > > >> >>>> > > > > >> >>>> Though on a separate note, I have a different question. What > > > > happens > > > > >> >>> when > > > > >> >>>> there is an issue with the client which sends the initial > > > heartbeat > > > > >> >>> without > > > > >> >>>> memberId, then crashes and restarts? I think we must be > > > > re-triggering > > > > >> >>>> assignments and expiring members only after the heartbeat > > session > > > > >> >>> timeout? > > > > >> >>>> If that's true then shall delayed partition assignment can > help > > > > >> benefit > > > > >> >>> us > > > > >> >>>> from this situation as well? > > > > >> >>>> > > > > >> >>>> Regards, > > > > >> >>>> Apoorv Mittal > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot > > > > >> >>> <dja...@confluent.io.invalid> > > > > >> >>>> wrote: > > > > >> >>>> > > > > >> >>>>> Hi Andrew, > > > > >> >>>>> > > > > >> >>>>> Personally, I don't like the lobby approach. It makes things > > > more > > > > >> >>>>> complicated and it would require changing the records on the > > > > server > > > > >> >>> too. > > > > >> >>>>> This is why I initially suggested the rejected alternative > #2 > > > > which > > > > >> is > > > > >> >>>>> pretty close but also not perfect. > > > > >> >>>>> > > > > >> >>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat > API > > > > >> already > > > > >> >>>>> supports generating the member id on the client so we don't > > need > > > > any > > > > >> >>>>> conditional logic on the client side. This is actually what > we > > > > >> wanted > > > > >> >>> to > > > > >> >>>> do > > > > >> >>>>> in the first place but the idea got pushed back by Magnus > back > > > > then > > > > >> >>>> because > > > > >> >>>>> generating uuid from librdkafka required a new dependency. > It > > > > turns > > > > >> >>> out > > > > >> >>>>> that librdkafka has that dependency today. In retrospect, we > > > > should > > > > >> >>> have > > > > >> >>>>> pushed back on this. Long story short, we can just do it. > The > > > > >> >>> proposal in > > > > >> >>>>> this KIP is to make the member id required in future > versions. > > > We > > > > >> >>> could > > > > >> >>>>> also decide not to do it and to keep supporting both > > > approaches. I > > > > >> >>> would > > > > >> >>>>> also be fine with this. > > > > >> >>>>> > > > > >> >>>>> Best, > > > > >> >>>>> David > > > > >> >>>>> > > > > >> >>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield < > > > > >> >>>>> andrew_schofi...@live.com> > > > > >> >>>>> wrote: > > > > >> >>>>> > > > > >> >>>>>> Hi TengYao, > > > > >> >>>>>> Thanks for your response. I’ll have just one more try to > > > > persuade. > > > > >> >>>>>> I feel that I will need to follow the approach with KIP-932 > > > when > > > > >> >>> we’ve > > > > >> >>>>>> made a decision, so I do have more than a passing interest > in > > > > this. > > > > >> >>>>>> > > > > >> >>>>>> A group member in the lobby is in the group, but it does > not > > > have > > > > >> >>> any > > > > >> >>>>>> assignments. A member of a consumer group can have no > > assigned > > > > >> >>>>>> partitions (such as 5 CG members subscribed to a topic > with 4 > > > > >> >>>>> partitions), > > > > >> >>>>>> so it’s a situation that consumer group members already > > expect. > > > > >> >>>>>> > > > > >> >>>>>> One of Kafka’s strengths is the way that we handle API > > > > versioning. > > > > >> >>>>>> But, there is a cost - the behaviour is different depending > > on > > > > the > > > > >> >>> RPC > > > > >> >>>>>> version. KIP-848 is on the cusp of completion, but we’re > > > already > > > > >> >>> adding > > > > >> >>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat. > > That’s > > > a > > > > >> >>> pity. > > > > >> >>>>>> Only a minor issue, but it’s unfortunate. > > > > >> >>>>>> > > > > >> >>>>>> Thanks, > > > > >> >>>>>> Andrew > > > > >> >>>>>> > > > > >> >>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi < > kiting...@gmail.com> > > > > >> >>> wrote: > > > > >> >>>>>>> > > > > >> >>>>>>> Hello Andrew > > > > >> >>>>>>> Thank you for your thoughtful suggestions and getting the > > > > >> >>> discussion > > > > >> >>>>>> going. > > > > >> >>>>>>> > > > > >> >>>>>>> To AS1: > > > > >> >>>>>>> In the current scenario where the server generates the > UUID, > > > if > > > > >> >>> the > > > > >> >>>>>> client > > > > >> >>>>>>> shuts down before receiving the memberId generated by the > GC > > > > >> >>>>> (regardless > > > > >> >>>>>> of > > > > >> >>>>>>> whether it’s a graceful shutdown or not), the GC will > still > > > have > > > > >> >>> to > > > > >> >>>>> wait > > > > >> >>>>>>> for the heartbeat timeout because the client doesn’t know > > its > > > > >> >>>> memberId. > > > > >> >>>>>>> This KIP indeed cannot completely resolve the idempotency > > > issue, > > > > >> >>> but > > > > >> >>>> it > > > > >> >>>>>> can > > > > >> >>>>>>> better handle shutdown scenarios under normal > circumstances > > > > >> >>> because > > > > >> >>>> the > > > > >> >>>>>>> client always knows its memberId. Even if the client shuts > > > down > > > > >> >>>>>> immediately > > > > >> >>>>>>> after the initial heartbeat, as long as it performs a > > graceful > > > > >> >>>> shutdown > > > > >> >>>>>> and > > > > >> >>>>>>> sends a leave heartbeat, the GC can manage the situation > and > > > > >> >>> remove > > > > >> >>>> the > > > > >> >>>>>>> member. Therefore, the goal of this KIP is to address the > > > issue > > > > >> >>> where > > > > >> >>>>> the > > > > >> >>>>>>> GC has to wait for the heartbeat timeout due to the client > > > > leaving > > > > >> >>>>>> without > > > > >> >>>>>>> knowing its memberId, which leads to reduced throughput > and > > > > >> >>> limited > > > > >> >>>>>>> scalability. > > > > >> >>>>>>> > > > > >> >>>>>>> The solution you suggest has also been proposed by David. > > The > > > > >> >>> concern > > > > >> >>>>>> with > > > > >> >>>>>>> this approach is that it introduces additional complexity > > for > > > > >> >>>>>>> compatibility, as the new server would not immediately add > > the > > > > >> >>> member > > > > >> >>>>> to > > > > >> >>>>>>> the group, while the old server would. This requires > clients > > > to > > > > >> >>>>>>> differentiate whether their memberId has been added to the > > > group > > > > >> >>> or > > > > >> >>>>> not, > > > > >> >>>>>>> which could result in unexpected logs. > > > > >> >>>>>>> > > > > >> >>>>>>> Best Regards, > > > > >> >>>>>>> TengYao > > > > >> >>>>>>> > > > > >> >>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月14日 > > 週三 > > > > >> >>>>> 上午12:29寫道: > > > > >> >>>>>>> > > > > >> >>>>>>>> Hi TengYao, > > > > >> >>>>>>>> Thanks for the KIP. I wonder if there’s a different way > to > > > > close > > > > >> >>>> what > > > > >> >>>>>>>> is quite a small window. > > > > >> >>>>>>>> > > > > >> >>>>>>>> AS1: It is true that the initial heartbeat is not > > idempotent, > > > > but > > > > >> >>>> this > > > > >> >>>>>>>> remains > > > > >> >>>>>>>> true with this KIP. It’s just differently not idempotent. > > If > > > > the > > > > >> >>>>> client > > > > >> >>>>>>>> makes its > > > > >> >>>>>>>> own member ID, sends a request and dies, the GC will > still > > > have > > > > >> >>>> added > > > > >> >>>>>>>> the member to the group and it will hang around until the > > > > session > > > > >> >>>>>> expires. > > > > >> >>>>>>>> > > > > >> >>>>>>>> I wonder if the GC could still generate the member ID in > > > > >> >>> response to > > > > >> >>>>> the > > > > >> >>>>>>>> first > > > > >> >>>>>>>> heartbeat, and put the member in a special PENDING state > > with > > > > no > > > > >> >>>>>>>> assignments until the client sends the next heartbeat, > thus > > > > >> >>>> confirming > > > > >> >>>>>> it > > > > >> >>>>>>>> has received the member ID. This would not be a protocol > > > change > > > > >> >>> at > > > > >> >>>>> all, > > > > >> >>>>>>>> just > > > > >> >>>>>>>> a change to the GC to keep a new member in the lobby > until > > > it’s > > > > >> >>>>>> comfirmed > > > > >> >>>>>>>> it knows its member ID. > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> Thanks, > > > > >> >>>>>>>> Andrew > > > > >> >>>>>>>> > > > > >> >>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi < > > kiting...@gmail.com> > > > > >> >>> wrote: > > > > >> >>>>>>>>> > > > > >> >>>>>>>>> Hi Chia-Ping, > > > > >> >>>>>>>>> > > > > >> >>>>>>>>> Thanks for review and suggestions. > > > > >> >>>>>>>>> I have updated the content of KIP accordingly. > > > > >> >>>>>>>>> Please take a look. > > > > >> >>>>>>>>> > > > > >> >>>>>>>>> Best regards, > > > > >> >>>>>>>>> TengYao > > > > >> >>>>>>>>> > > > > >> >>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二 > > > > 下午9:45寫道: > > > > >> >>>>>>>>> > > > > >> >>>>>>>>>> hi TengYao > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> thanks for this KIP. > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> 1) could you please describe the before/after behavior > in > > > the > > > > >> >>>>>> "Proposed > > > > >> >>>>>>>>>> Changes" section? IIRC, current RPC allows HB having > > member > > > > id > > > > >> >>>>>>>> generated by > > > > >> >>>>>>>>>> client, right? If HB has no member ID, server will > > generate > > > > one > > > > >> >>>> and > > > > >> >>>>>> then > > > > >> >>>>>>>>>> return. The new behavior will enforce HB "must" have > > member > > > > ID. > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> 2) could you please write the version number explicitly > > in > > > > the > > > > >> >>> KIP > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> 3) how new client code handle the old HB? Does it > always > > > > >> >>> generate > > > > >> >>>>>> member > > > > >> >>>>>>>>>> ID on client-side even though that is not restricted? > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> Best, > > > > >> >>>>>>>>>> Chia-Ping > > > > >> >>>>>>>>>> > > > > >> >>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote: > > > > >> >>>>>>>>>>> Hello everyone, > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> I would like to start a discussion thread on KIP-1082, > > > which > > > > >> >>>>> proposes > > > > >> >>>>>>>>>>> enabling id generation for clients over the > > > > >> >>>> ConsumerGroupHeartbeat > > > > >> >>>>>> RPC. > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> Here is the KIP Link: KIP-1082 > > > > >> >>>>>>>>>>> < > > > > >> >>>>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> Please take a look and let me know what you think, > and I > > > > would > > > > >> >>>>>>>> appreciate > > > > >> >>>>>>>>>>> any suggestions and feedback. > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>>> Best regards, > > > > >> >>>>>>>>>>> TengYao > > > > >> >>>>>>>>>>> > > > > >> >>>>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > >> > > > > > > > > > >