Hi TengYao, I’ve been reading through the comments and I’m happy that the lobby approach has not gained support.
Assuming that this KIP is voted, I will be happy to change KIP-932 so that it only supports client-side member ID generation. Because that KIP is still under development, I can do this in the first version of ShareGroupHeartbeat. AS2: For the encoding section, I suppose the specific encoding which is used is what org.apache.kafka.utils.Uuid uses. Thanks, Andrew > On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com> wrote: > > Hello Apoorv, > Thank you for your feedback. > Regarding the questions you raised, unfortunately, this KIP cannot > guarantee the order of heartbeats. As with many classic distributed system > challenges, what we can do is make our best effort to ensure that there are > no idle members or stale assignments under normal circumstances. > > As for the lobby approach, I’m not a fan of it because it requires adding a > mechanism to maintain client state within the ConsumerGroup, which, in my > view, resembles something like a two-phase commit. This would introduce > more complexity than the proposal in this KIP, which is something we want > to avoid. KIP-848 aims to simplify the existing protocol, and while the > lobby approach is a good one, I believe it is not the right fit for this > particular situation. > > Best regards, > TengYao > > TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道: > >> Hi David, >> >> I really appreciate your review and suggestions. As I am still gaining >> experience in writing KIPs, your input has been incredibly helpful. I am >> currently applying your suggestions to the KIP and will complete it as soon >> as possible. >> Regarding the UUID part, I think we haven’t reached a conclusion yet.(So >> far according to this thread) >> However, I will review the current implementation in the Kafka `Uuid` >> class and include a brief specification in the KIP. >> >> Once again, thank you so much for your help. >> >> Best regards, >> TengYao >> >> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 下午11:14寫道: >> >>> hi Apoorv >>> >>>> As the memberId is now known to the client, and client might send the >>> leave >>> group heartbeat on shutdown prior to receiving the initial heartbeat >>> response. If that's true then how do we guarantee that the 2 requests to >>> join and leave will be processed in order, which could still leave stale >>> members or throw unknown member id exceptions? >>> >>> This is definitely a good question. the short answer: no guarantee but >>> best >>> efforts >>> >>> Please notice the root cause is "we have no enough time to wait member id >>> (response) when closing consumer". Sadly, we can' guarantee the request >>> order due to the same reason. >>> >>> However, in contrast to previous behavior, there is one big benefit of new >>> approach - we can try STONITH because we know the member id >>> >>> Best, >>> Chia-Ping >>> >>> >>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三 下午8:55寫道: >>> >>>> Hi TengYao, >>>> Thanks for the KIP. Continuing on the point which Andrew mentioned as >>> AS1. >>>> >>>> As the memberId is now known to the client, and client might send the >>> leave >>>> group heartbeat on shutdown prior to receiving the initial heartbeat >>>> response. If that's true then how do we guarantee that the 2 requests to >>>> join and leave will be processed in order, which could still leave stale >>>> members or throw unknown member id exceptions? >>>> >>>> Though the client side member id generation is helpful which will >>> represent >>>> the same group perspective as from client and broker's end. But I think >>> the >>>> major concern we want to solve here is Stale Partition Assignments which >>>> might still exist with the new approach. I am leaning towards the >>>> suggestion mentioned by Andrew where partition assignment triggers on >>>> subsequent heartbeat when client acknowledges the initial heartbeat, >>>> delayed partition assignment. >>>> >>>> Though on a separate note, I have a different question. What happens >>> when >>>> there is an issue with the client which sends the initial heartbeat >>> without >>>> memberId, then crashes and restarts? I think we must be re-triggering >>>> assignments and expiring members only after the heartbeat session >>> timeout? >>>> If that's true then shall delayed partition assignment can help benefit >>> us >>>> from this situation as well? >>>> >>>> Regards, >>>> Apoorv Mittal >>>> >>>> >>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot >>> <dja...@confluent.io.invalid> >>>> wrote: >>>> >>>>> Hi Andrew, >>>>> >>>>> Personally, I don't like the lobby approach. It makes things more >>>>> complicated and it would require changing the records on the server >>> too. >>>>> This is why I initially suggested the rejected alternative #2 which is >>>>> pretty close but also not perfect. >>>>> >>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat API already >>>>> supports generating the member id on the client so we don't need any >>>>> conditional logic on the client side. This is actually what we wanted >>> to >>>> do >>>>> in the first place but the idea got pushed back by Magnus back then >>>> because >>>>> generating uuid from librdkafka required a new dependency. It turns >>> out >>>>> that librdkafka has that dependency today. In retrospect, we should >>> have >>>>> pushed back on this. Long story short, we can just do it. The >>> proposal in >>>>> this KIP is to make the member id required in future versions. We >>> could >>>>> also decide not to do it and to keep supporting both approaches. I >>> would >>>>> also be fine with this. >>>>> >>>>> Best, >>>>> David >>>>> >>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield < >>>>> andrew_schofi...@live.com> >>>>> wrote: >>>>> >>>>>> Hi TengYao, >>>>>> Thanks for your response. I’ll have just one more try to persuade. >>>>>> I feel that I will need to follow the approach with KIP-932 when >>> we’ve >>>>>> made a decision, so I do have more than a passing interest in this. >>>>>> >>>>>> A group member in the lobby is in the group, but it does not have >>> any >>>>>> assignments. A member of a consumer group can have no assigned >>>>>> partitions (such as 5 CG members subscribed to a topic with 4 >>>>> partitions), >>>>>> so it’s a situation that consumer group members already expect. >>>>>> >>>>>> One of Kafka’s strengths is the way that we handle API versioning. >>>>>> But, there is a cost - the behaviour is different depending on the >>> RPC >>>>>> version. KIP-848 is on the cusp of completion, but we’re already >>> adding >>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat. That’s a >>> pity. >>>>>> Only a minor issue, but it’s unfortunate. >>>>>> >>>>>> Thanks, >>>>>> Andrew >>>>>> >>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi <kiting...@gmail.com> >>> wrote: >>>>>>> >>>>>>> Hello Andrew >>>>>>> Thank you for your thoughtful suggestions and getting the >>> discussion >>>>>> going. >>>>>>> >>>>>>> To AS1: >>>>>>> In the current scenario where the server generates the UUID, if >>> the >>>>>> client >>>>>>> shuts down before receiving the memberId generated by the GC >>>>> (regardless >>>>>> of >>>>>>> whether it’s a graceful shutdown or not), the GC will still have >>> to >>>>> wait >>>>>>> for the heartbeat timeout because the client doesn’t know its >>>> memberId. >>>>>>> This KIP indeed cannot completely resolve the idempotency issue, >>> but >>>> it >>>>>> can >>>>>>> better handle shutdown scenarios under normal circumstances >>> because >>>> the >>>>>>> client always knows its memberId. Even if the client shuts down >>>>>> immediately >>>>>>> after the initial heartbeat, as long as it performs a graceful >>>> shutdown >>>>>> and >>>>>>> sends a leave heartbeat, the GC can manage the situation and >>> remove >>>> the >>>>>>> member. Therefore, the goal of this KIP is to address the issue >>> where >>>>> the >>>>>>> GC has to wait for the heartbeat timeout due to the client leaving >>>>>> without >>>>>>> knowing its memberId, which leads to reduced throughput and >>> limited >>>>>>> scalability. >>>>>>> >>>>>>> The solution you suggest has also been proposed by David. The >>> concern >>>>>> with >>>>>>> this approach is that it introduces additional complexity for >>>>>>> compatibility, as the new server would not immediately add the >>> member >>>>> to >>>>>>> the group, while the old server would. This requires clients to >>>>>>> differentiate whether their memberId has been added to the group >>> or >>>>> not, >>>>>>> which could result in unexpected logs. >>>>>>> >>>>>>> Best Regards, >>>>>>> TengYao >>>>>>> >>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月14日 週三 >>>>> 上午12:29寫道: >>>>>>> >>>>>>>> Hi TengYao, >>>>>>>> Thanks for the KIP. I wonder if there’s a different way to close >>>> what >>>>>>>> is quite a small window. >>>>>>>> >>>>>>>> AS1: It is true that the initial heartbeat is not idempotent, but >>>> this >>>>>>>> remains >>>>>>>> true with this KIP. It’s just differently not idempotent. If the >>>>> client >>>>>>>> makes its >>>>>>>> own member ID, sends a request and dies, the GC will still have >>>> added >>>>>>>> the member to the group and it will hang around until the session >>>>>> expires. >>>>>>>> >>>>>>>> I wonder if the GC could still generate the member ID in >>> response to >>>>> the >>>>>>>> first >>>>>>>> heartbeat, and put the member in a special PENDING state with no >>>>>>>> assignments until the client sends the next heartbeat, thus >>>> confirming >>>>>> it >>>>>>>> has received the member ID. This would not be a protocol change >>> at >>>>> all, >>>>>>>> just >>>>>>>> a change to the GC to keep a new member in the lobby until it’s >>>>>> comfirmed >>>>>>>> it knows its member ID. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Andrew >>>>>>>> >>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi <kiting...@gmail.com> >>> wrote: >>>>>>>>> >>>>>>>>> Hi Chia-Ping, >>>>>>>>> >>>>>>>>> Thanks for review and suggestions. >>>>>>>>> I have updated the content of KIP accordingly. >>>>>>>>> Please take a look. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> TengYao >>>>>>>>> >>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二 下午9:45寫道: >>>>>>>>> >>>>>>>>>> hi TengYao >>>>>>>>>> >>>>>>>>>> thanks for this KIP. >>>>>>>>>> >>>>>>>>>> 1) could you please describe the before/after behavior in the >>>>>> "Proposed >>>>>>>>>> Changes" section? IIRC, current RPC allows HB having member id >>>>>>>> generated by >>>>>>>>>> client, right? If HB has no member ID, server will generate one >>>> and >>>>>> then >>>>>>>>>> return. The new behavior will enforce HB "must" have member ID. >>>>>>>>>> >>>>>>>>>> 2) could you please write the version number explicitly in the >>> KIP >>>>>>>>>> >>>>>>>>>> 3) how new client code handle the old HB? Does it always >>> generate >>>>>> member >>>>>>>>>> ID on client-side even though that is not restricted? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Chia-Ping >>>>>>>>>> >>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote: >>>>>>>>>>> Hello everyone, >>>>>>>>>>> >>>>>>>>>>> I would like to start a discussion thread on KIP-1082, which >>>>> proposes >>>>>>>>>>> enabling id generation for clients over the >>>> ConsumerGroupHeartbeat >>>>>> RPC. >>>>>>>>>>> >>>>>>>>>>> Here is the KIP Link: KIP-1082 >>>>>>>>>>> < >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC >>>>>>>>>>> >>>>>>>>>>> Please take a look and let me know what you think, and I would >>>>>>>> appreciate >>>>>>>>>>> any suggestions and feedback. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> TengYao >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>