Hi everyone, I have revised this KIP multiple times based on the feedback from our discussions. I would greatly appreciate it if you could review it when you have the time. If there are no further comments or suggestions, I plan to proceed with initiating a vote soon.
Best regards, TengYao TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道: > Hi Andrew, > Thank you for your previous feedback and insights. > Your contributions have added valuable perspectives to the discussions. > And we also benefit from the comparison of different solutions. > I’m also looking forward to seeing an initial version in KIP-932, as it > will provide a good reference for future implementations. > > Regarding your comment on AS2, I wanted to clarify that my specification > references org.apache.kafka.common.Uuid. > I believe we’re referring to the same class, and it might just be a small > oversight due to the busy schedule. > > I want to express my gratitude once again for your many insightful > comments, which have helped the discussion progress smoothly. > > Best regards, > TengYao > > > Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 週四 下午11:28寫道: > >> Hi TengYao, >> I’ve been reading through the comments and I’m happy that the lobby >> approach has not gained support. >> >> Assuming that this KIP is voted, I will be happy to change KIP-932 so >> that it only supports client-side member ID generation. Because that KIP >> is still >> under development, I can do this in the first version of >> ShareGroupHeartbeat. >> >> AS2: For the encoding section, I suppose the specific encoding which >> is used is what org.apache.kafka.utils.Uuid uses. >> >> Thanks, >> Andrew >> >> > On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com> wrote: >> > >> > Hello Apoorv, >> > Thank you for your feedback. >> > Regarding the questions you raised, unfortunately, this KIP cannot >> > guarantee the order of heartbeats. As with many classic distributed >> system >> > challenges, what we can do is make our best effort to ensure that there >> are >> > no idle members or stale assignments under normal circumstances. >> > >> > As for the lobby approach, I’m not a fan of it because it requires >> adding a >> > mechanism to maintain client state within the ConsumerGroup, which, in >> my >> > view, resembles something like a two-phase commit. This would introduce >> > more complexity than the proposal in this KIP, which is something we >> want >> > to avoid. KIP-848 aims to simplify the existing protocol, and while the >> > lobby approach is a good one, I believe it is not the right fit for this >> > particular situation. >> > >> > Best regards, >> > TengYao >> > >> > TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道: >> > >> >> Hi David, >> >> >> >> I really appreciate your review and suggestions. As I am still gaining >> >> experience in writing KIPs, your input has been incredibly helpful. I >> am >> >> currently applying your suggestions to the KIP and will complete it as >> soon >> >> as possible. >> >> Regarding the UUID part, I think we haven’t reached a conclusion >> yet.(So >> >> far according to this thread) >> >> However, I will review the current implementation in the Kafka `Uuid` >> >> class and include a brief specification in the KIP. >> >> >> >> Once again, thank you so much for your help. >> >> >> >> Best regards, >> >> TengYao >> >> >> >> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 下午11:14寫道: >> >> >> >>> hi Apoorv >> >>> >> >>>> As the memberId is now known to the client, and client might send the >> >>> leave >> >>> group heartbeat on shutdown prior to receiving the initial heartbeat >> >>> response. If that's true then how do we guarantee that the 2 requests >> to >> >>> join and leave will be processed in order, which could still leave >> stale >> >>> members or throw unknown member id exceptions? >> >>> >> >>> This is definitely a good question. the short answer: no guarantee but >> >>> best >> >>> efforts >> >>> >> >>> Please notice the root cause is "we have no enough time to wait >> member id >> >>> (response) when closing consumer". Sadly, we can' guarantee the >> request >> >>> order due to the same reason. >> >>> >> >>> However, in contrast to previous behavior, there is one big benefit >> of new >> >>> approach - we can try STONITH because we know the member id >> >>> >> >>> Best, >> >>> Chia-Ping >> >>> >> >>> >> >>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三 下午8:55寫道: >> >>> >> >>>> Hi TengYao, >> >>>> Thanks for the KIP. Continuing on the point which Andrew mentioned as >> >>> AS1. >> >>>> >> >>>> As the memberId is now known to the client, and client might send the >> >>> leave >> >>>> group heartbeat on shutdown prior to receiving the initial heartbeat >> >>>> response. If that's true then how do we guarantee that the 2 >> requests to >> >>>> join and leave will be processed in order, which could still leave >> stale >> >>>> members or throw unknown member id exceptions? >> >>>> >> >>>> Though the client side member id generation is helpful which will >> >>> represent >> >>>> the same group perspective as from client and broker's end. But I >> think >> >>> the >> >>>> major concern we want to solve here is Stale Partition Assignments >> which >> >>>> might still exist with the new approach. I am leaning towards the >> >>>> suggestion mentioned by Andrew where partition assignment triggers on >> >>>> subsequent heartbeat when client acknowledges the initial heartbeat, >> >>>> delayed partition assignment. >> >>>> >> >>>> Though on a separate note, I have a different question. What happens >> >>> when >> >>>> there is an issue with the client which sends the initial heartbeat >> >>> without >> >>>> memberId, then crashes and restarts? I think we must be re-triggering >> >>>> assignments and expiring members only after the heartbeat session >> >>> timeout? >> >>>> If that's true then shall delayed partition assignment can help >> benefit >> >>> us >> >>>> from this situation as well? >> >>>> >> >>>> Regards, >> >>>> Apoorv Mittal >> >>>> >> >>>> >> >>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot >> >>> <dja...@confluent.io.invalid> >> >>>> wrote: >> >>>> >> >>>>> Hi Andrew, >> >>>>> >> >>>>> Personally, I don't like the lobby approach. It makes things more >> >>>>> complicated and it would require changing the records on the server >> >>> too. >> >>>>> This is why I initially suggested the rejected alternative #2 which >> is >> >>>>> pretty close but also not perfect. >> >>>>> >> >>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat API >> already >> >>>>> supports generating the member id on the client so we don't need any >> >>>>> conditional logic on the client side. This is actually what we >> wanted >> >>> to >> >>>> do >> >>>>> in the first place but the idea got pushed back by Magnus back then >> >>>> because >> >>>>> generating uuid from librdkafka required a new dependency. It turns >> >>> out >> >>>>> that librdkafka has that dependency today. In retrospect, we should >> >>> have >> >>>>> pushed back on this. Long story short, we can just do it. The >> >>> proposal in >> >>>>> this KIP is to make the member id required in future versions. We >> >>> could >> >>>>> also decide not to do it and to keep supporting both approaches. I >> >>> would >> >>>>> also be fine with this. >> >>>>> >> >>>>> Best, >> >>>>> David >> >>>>> >> >>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield < >> >>>>> andrew_schofi...@live.com> >> >>>>> wrote: >> >>>>> >> >>>>>> Hi TengYao, >> >>>>>> Thanks for your response. I’ll have just one more try to persuade. >> >>>>>> I feel that I will need to follow the approach with KIP-932 when >> >>> we’ve >> >>>>>> made a decision, so I do have more than a passing interest in this. >> >>>>>> >> >>>>>> A group member in the lobby is in the group, but it does not have >> >>> any >> >>>>>> assignments. A member of a consumer group can have no assigned >> >>>>>> partitions (such as 5 CG members subscribed to a topic with 4 >> >>>>> partitions), >> >>>>>> so it’s a situation that consumer group members already expect. >> >>>>>> >> >>>>>> One of Kafka’s strengths is the way that we handle API versioning. >> >>>>>> But, there is a cost - the behaviour is different depending on the >> >>> RPC >> >>>>>> version. KIP-848 is on the cusp of completion, but we’re already >> >>> adding >> >>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat. That’s a >> >>> pity. >> >>>>>> Only a minor issue, but it’s unfortunate. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Andrew >> >>>>>> >> >>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi <kiting...@gmail.com> >> >>> wrote: >> >>>>>>> >> >>>>>>> Hello Andrew >> >>>>>>> Thank you for your thoughtful suggestions and getting the >> >>> discussion >> >>>>>> going. >> >>>>>>> >> >>>>>>> To AS1: >> >>>>>>> In the current scenario where the server generates the UUID, if >> >>> the >> >>>>>> client >> >>>>>>> shuts down before receiving the memberId generated by the GC >> >>>>> (regardless >> >>>>>> of >> >>>>>>> whether it’s a graceful shutdown or not), the GC will still have >> >>> to >> >>>>> wait >> >>>>>>> for the heartbeat timeout because the client doesn’t know its >> >>>> memberId. >> >>>>>>> This KIP indeed cannot completely resolve the idempotency issue, >> >>> but >> >>>> it >> >>>>>> can >> >>>>>>> better handle shutdown scenarios under normal circumstances >> >>> because >> >>>> the >> >>>>>>> client always knows its memberId. Even if the client shuts down >> >>>>>> immediately >> >>>>>>> after the initial heartbeat, as long as it performs a graceful >> >>>> shutdown >> >>>>>> and >> >>>>>>> sends a leave heartbeat, the GC can manage the situation and >> >>> remove >> >>>> the >> >>>>>>> member. Therefore, the goal of this KIP is to address the issue >> >>> where >> >>>>> the >> >>>>>>> GC has to wait for the heartbeat timeout due to the client leaving >> >>>>>> without >> >>>>>>> knowing its memberId, which leads to reduced throughput and >> >>> limited >> >>>>>>> scalability. >> >>>>>>> >> >>>>>>> The solution you suggest has also been proposed by David. The >> >>> concern >> >>>>>> with >> >>>>>>> this approach is that it introduces additional complexity for >> >>>>>>> compatibility, as the new server would not immediately add the >> >>> member >> >>>>> to >> >>>>>>> the group, while the old server would. This requires clients to >> >>>>>>> differentiate whether their memberId has been added to the group >> >>> or >> >>>>> not, >> >>>>>>> which could result in unexpected logs. >> >>>>>>> >> >>>>>>> Best Regards, >> >>>>>>> TengYao >> >>>>>>> >> >>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月14日 週三 >> >>>>> 上午12:29寫道: >> >>>>>>> >> >>>>>>>> Hi TengYao, >> >>>>>>>> Thanks for the KIP. I wonder if there’s a different way to close >> >>>> what >> >>>>>>>> is quite a small window. >> >>>>>>>> >> >>>>>>>> AS1: It is true that the initial heartbeat is not idempotent, but >> >>>> this >> >>>>>>>> remains >> >>>>>>>> true with this KIP. It’s just differently not idempotent. If the >> >>>>> client >> >>>>>>>> makes its >> >>>>>>>> own member ID, sends a request and dies, the GC will still have >> >>>> added >> >>>>>>>> the member to the group and it will hang around until the session >> >>>>>> expires. >> >>>>>>>> >> >>>>>>>> I wonder if the GC could still generate the member ID in >> >>> response to >> >>>>> the >> >>>>>>>> first >> >>>>>>>> heartbeat, and put the member in a special PENDING state with no >> >>>>>>>> assignments until the client sends the next heartbeat, thus >> >>>> confirming >> >>>>>> it >> >>>>>>>> has received the member ID. This would not be a protocol change >> >>> at >> >>>>> all, >> >>>>>>>> just >> >>>>>>>> a change to the GC to keep a new member in the lobby until it’s >> >>>>>> comfirmed >> >>>>>>>> it knows its member ID. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> Andrew >> >>>>>>>> >> >>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi <kiting...@gmail.com> >> >>> wrote: >> >>>>>>>>> >> >>>>>>>>> Hi Chia-Ping, >> >>>>>>>>> >> >>>>>>>>> Thanks for review and suggestions. >> >>>>>>>>> I have updated the content of KIP accordingly. >> >>>>>>>>> Please take a look. >> >>>>>>>>> >> >>>>>>>>> Best regards, >> >>>>>>>>> TengYao >> >>>>>>>>> >> >>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二 下午9:45寫道: >> >>>>>>>>> >> >>>>>>>>>> hi TengYao >> >>>>>>>>>> >> >>>>>>>>>> thanks for this KIP. >> >>>>>>>>>> >> >>>>>>>>>> 1) could you please describe the before/after behavior in the >> >>>>>> "Proposed >> >>>>>>>>>> Changes" section? IIRC, current RPC allows HB having member id >> >>>>>>>> generated by >> >>>>>>>>>> client, right? If HB has no member ID, server will generate one >> >>>> and >> >>>>>> then >> >>>>>>>>>> return. The new behavior will enforce HB "must" have member ID. >> >>>>>>>>>> >> >>>>>>>>>> 2) could you please write the version number explicitly in the >> >>> KIP >> >>>>>>>>>> >> >>>>>>>>>> 3) how new client code handle the old HB? Does it always >> >>> generate >> >>>>>> member >> >>>>>>>>>> ID on client-side even though that is not restricted? >> >>>>>>>>>> >> >>>>>>>>>> Best, >> >>>>>>>>>> Chia-Ping >> >>>>>>>>>> >> >>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote: >> >>>>>>>>>>> Hello everyone, >> >>>>>>>>>>> >> >>>>>>>>>>> I would like to start a discussion thread on KIP-1082, which >> >>>>> proposes >> >>>>>>>>>>> enabling id generation for clients over the >> >>>> ConsumerGroupHeartbeat >> >>>>>> RPC. >> >>>>>>>>>>> >> >>>>>>>>>>> Here is the KIP Link: KIP-1082 >> >>>>>>>>>>> < >> >>>>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC >> >>>>>>>>>>> >> >>>>>>>>>>> Please take a look and let me know what you think, and I would >> >>>>>>>> appreciate >> >>>>>>>>>>> any suggestions and feedback. >> >>>>>>>>>>> >> >>>>>>>>>>> Best regards, >> >>>>>>>>>>> TengYao >> >>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> >>