Hello everyone, Since this KIP has been fully discussed, I will initiate a vote for it next Monday. Thank you and have a nice weekend.
Best regards, TengYao TengYao Chi <kiting...@gmail.com> 於 2024年9月5日 週四 下午2:19寫道: > Hello everyone, > > KT2: It looks like everyone who has expressed an opinion supports the > second option: “Document a recommendation for clients to use UUIDs as > member IDs, without strictly enforcing it.” > I have updated the KIP accordingly. > Please take a look, and let me know if you have any thoughts or feedback. > > Thank you! > > Best regards, > TengYao > > Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月30日 週五 下午9:56寫道: > >> hi TengYao >> >> KT2: +1 to second approach >> >> Best, >> Chia-Ping >> >> >> David Jacot <dja...@confluent.io.invalid> 於 2024年8月30日 週五 下午9:15寫道: >> >> > Hi TengYao, >> > >> > KT2: I don't think that we can realistically validate the uuid on the >> > server. It is basically a string of chars. So I lean towards having a >> good >> > recommendation in the KIP and in the document of the field in the RPC's >> > definition. >> > >> > Best, >> > David >> > >> > On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi <kiting...@gmail.com> >> wrote: >> > >> > > Hello Kirk ! >> > > >> > > Thank you for your comments ! >> > > >> > > KT1: Yes, you are correct. The issue is not unique to the initial >> > > heartbeat; there can always be cases where the broker might lose >> > connection >> > > with a member. >> > > >> > > KT2: Currently, if the client doesn't have a member ID and the >> > memberEpoch >> > > equals 0, the coordinator will generate a UUID as the member ID for >> the >> > > client. However, at the RPC level, the member ID is sent as a literal >> > > string, meaning there are no restrictions on the format at this level. >> > > This also reminds me that we haven't reached a final conclusion on >> how to >> > > enforce the use of UUIDs. >> > > From our previous discussions, I recall two possible approaches: >> > > The first is to validate the UUID on the server side, and if it's not >> > > valid, throw an exception to the client. >> > > The second is to document a recommendation for clients to use UUIDs as >> > > member IDs, without strictly enforcing it. >> > > I think it's time to decide on the approach we want to take. >> > > >> > > KT3: Yes, "session" can be considered synonymous with "membership" in >> > this >> > > context. >> > > >> > > KT4: Thank you for pointing that out. I will update the wording to >> > > specifically say this behavior is for consumers. >> > > >> > > Thanks again for your comments. >> > > >> > > Best regards, >> > > TengYao >> > > >> > > Kirk True <k...@kirktrue.pro> 於 2024年8月30日 週五 上午12:39寫道: >> > > >> > > > Hi TengYao! >> > > > >> > > > Sorry for being late to the discussion... >> > > > >> > > > After reading the thread and then the KIP, I had a few >> > > questions/comments: >> > > > >> > > > KT1: In Motivation, it states: "This scenario can result in the >> broker >> > > > registering a new member for which it will never receive a proper >> leave >> > > > request.” Just to be clear, the broker will always have cases where >> it >> > > > might lose connection with a member. That’s not unique to the >> initial >> > > > heartbeat, right? >> > > > >> > > > KT2: There was a bit of back and forth about format of the member >> ID. >> > > From >> > > > what I gathered in the thread, the member ID is still defined in the >> > RPC >> > > as >> > > > a string and not a UUID, right? The KIP states that the “client must >> > > > generate a UUID as the member ID” and that the “server will validate >> > > that a >> > > > valid UUID is provided.” Is that a change for the server, or is it >> > > already >> > > > enforced as a UUID? >> > > > >> > > > KT3: Lianet mentioned some confusion over the use of the word >> > “session.” >> > > > Isn’t “session” synonymous with “membership?” >> > > > >> > > > KT4: Under “Member ID Lifecycle,” it states: "The client should >> reuse >> > the >> > > > same UUID as the member ID for all heartbeats and rejoin attempts to >> > > > maintain continuity within the group.” Could we change the first >> part >> > of >> > > > that to “The Consumer instance should…” We do have lifetimes that >> > extend >> > > > past the lifetime of a client instance (such as the transaction ID). >> > > > >> > > > Thanks, >> > > > Kirk >> > > > >> > > > > On Aug 29, 2024, at 1:28 AM, TengYao Chi <kiting...@gmail.com> >> > wrote: >> > > > > >> > > > > Hi David, >> > > > > >> > > > > Thank you for pointing that out. >> > > > > I have updated the content of the KIP based on Lianet's and your >> > > > feedback. >> > > > > Please take a look and let me know your thoughts. >> > > > > >> > > > > Best regards, >> > > > > TengYao >> > > > > >> > > > > David Jacot <dja...@confluent.io.invalid> 於 2024年8月29日 週四 >> 下午3:20寫道: >> > > > > >> > > > >> Hi TengYao, >> > > > >> >> > > > >> Thanks for the update. I haven't fully read it yet but I will >> soon. >> > > > >> >> > > > >> LM4: This is incorrect. The consumer must keep its member id >> during >> > > its >> > > > >> entire lifetime (until the process stops or dies). The protocol >> > > > stipulates >> > > > >> that a member must rejoin with the same member id and the member >> > epoch >> > > > set >> > > > >> to zero when an FENCED_MEMBER_EPOCH occurs. This allows the >> member >> > to >> > > > >> resynchronize itself. We should not change this behavior. I think >> > that >> > > > we >> > > > >> should see the client side generation id as an incarnation id of >> the >> > > > >> application. It is generated once and kept until it stops or >> dies. >> > > > >> >> > > > >> Best, >> > > > >> David >> > > > >> >> > > > >> On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi <kiting...@gmail.com >> > >> > > > wrote: >> > > > >> >> > > > >>> Hello Lianet ! >> > > > >>> >> > > > >>> Thanks for the reviews and suggestions! >> > > > >>> >> > > > >>> LM1: Indeed, we plan to enforce client-side ID generation in the >> > > > future, >> > > > >>> and it is not an alternative. I will change the title >> accordingly. >> > > > >>> >> > > > >>> LM2: Yes, that's the expectation. I will add that statement to >> the >> > > > public >> > > > >>> interface section. >> > > > >>> >> > > > >>> LM3: Thank you for the high-level perspective review. I think >> > you're >> > > > >> right; >> > > > >>> our intention isn't very clear since it was placed at the end of >> > the >> > > > >>> section. I will try to rephrase that section to make it more >> > obvious. >> > > > >>> >> > > > >>> LM4: Regarding the definition of "session" in this KIP, I >> believe >> > it >> > > > >> refers >> > > > >>> to the period between the *first-time heartbeat* and when the >> > > *consumer >> > > > >>> leaves the group* (whether through a graceful shutdown or a >> > heartbeat >> > > > >>> timeout). The consumer should reuse its UUID if it has been >> > generated >> > > > >>> before. The only situation in which it will regenerate the UUID >> is >> > if >> > > > the >> > > > >>> coordinator finds that there is already a consumer with the same >> > > UUID. >> > > > >>> IIRC, the coordinator should compare the member epochs, and the >> > > > >>> later-joined consumer should be fenced off by the coordinator >> due >> > to >> > > > >> having >> > > > >>> a lower member epoch. Once the consumer receives a >> > > > `FENCED_MEMBER_EPOCH` >> > > > >>> error, it will generate a new UUID and attempt to rejoin. I will >> > > > clarify >> > > > >>> this in the KIP. >> > > > >>> >> > > > >>> Thanks again for your reviews, I really appreciate it. >> > > > >>> >> > > > >>> Best regards, >> > > > >>> TengYao >> > > > >>> >> > > > >>> Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 下午7:12寫道: >> > > > >>> >> > > > >>>> Hello TengYao! Thanks for taking on this issue, we've been >> going >> > > > around >> > > > >>> it >> > > > >>>> for a while. >> > > > >>>> >> > > > >>>> LM1: About the title of the KIP: "Enable ID Generation for >> Clients >> > > > over >> > > > >>> the >> > > > >>>> ConsumerGroupHeartbeat RPC". I find it confusing because it >> hints >> > > that >> > > > >>>> we're adding it as an alternative (which was discussed and >> > > discarded, >> > > > >> in >> > > > >>>> favour of really enforcing it). It's also missing the core >> change >> > > imo, >> > > > >>>> which is "where" the generation happens. So, maybe more to the >> > point >> > > > >> with >> > > > >>>> something along the lines of "Client-side generated ID for >> clients >> > > > over >> > > > >>>> ConsumerGroupHeartbeat RPC"? >> > > > >>>> >> > > > >>>> LM2: On the public interfaces section, the KIP states that "the >> > > server >> > > > >>> will >> > > > >>>> reject the request", but we should agree on the specific error >> > > type. I >> > > > >>>> expect it should fail with an InvalidRequestException, is that >> the >> > > > >>>> intention? (This was also suggested in the discussion thread >> > before >> > > > but >> > > > >>> is >> > > > >>>> not in the KIP). >> > > > >>>> >> > > > >>>> LM3. Related to my previous point, I find that to be the true >> > > > >>> public-facing >> > > > >>>> change (member ID mandatory at the protocol level), but it's >> only >> > at >> > > > >> the >> > > > >>>> end of the Public interfaces changes, kind of lost among >> details >> > of >> > > > how >> > > > >>>> we're going to do it. Should we rephrase that section with the >> > > actual >> > > > >>>> change first, and the hows after (ex. Bumping the version is >> not >> > the >> > > > >>>> public-facing change in this case, it's just the mechanism to >> > > properly >> > > > >>>> introduce our change) >> > > > >>>> >> > > > >>>> LM4. Regarding the lifetime of the UUID: the KIP states we will >> > > > "Verify >> > > > >>>> that the UUID remains consistent across all subsequent >> heartbeats >> > > > >> during >> > > > >>>> the session". What is this "session" referring to here? I would >> > > expect >> > > > >>> that >> > > > >>>> the UUID is associated to a consumer instance (generated for >> the >> > > > >> consumer >> > > > >>>> the first time it needs to send a HB if it doesn't have the >> UUID >> > > yet. >> > > > >>> From >> > > > >>>> there on, every time it needs to send a "first HB" again, it >> will >> > > > reuse >> > > > >>> its >> > > > >>>> UUID, is that the intention? Note that we should consider that >> the >> > > > same >> > > > >>>> consumer instance may have many "first heartbeats", meaning >> > > heartbeats >> > > > >> to >> > > > >>>> join the group when it's not part of it (ex. consumer >> unsubscribe >> > + >> > > > >>>> subscribe, fenced, stale). Is this the intention or are you >> > > > considering >> > > > >>> the >> > > > >>>> lifetime differently? We should clarify it in the KIP. >> > > > >>>> >> > > > >>>> Thanks! >> > > > >>>> >> > > > >>>> Lianet >> > > > >>>> >> > > > >>>> On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi < >> kiting...@gmail.com> >> > > > >> wrote: >> > > > >>>> >> > > > >>>>> Hi everyone, >> > > > >>>>> >> > > > >>>>> I have revised this KIP multiple times based on the feedback >> from >> > > our >> > > > >>>>> discussions. >> > > > >>>>> I would greatly appreciate it if you could review it when you >> > have >> > > > >> the >> > > > >>>>> time. >> > > > >>>>> If there are no further comments or suggestions, I plan to >> > proceed >> > > > >> with >> > > > >>>>> initiating a vote soon. >> > > > >>>>> >> > > > >>>>> Best regards, >> > > > >>>>> TengYao >> > > > >>>>> >> > > > >>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道: >> > > > >>>>> >> > > > >>>>>> Hi Andrew, >> > > > >>>>>> Thank you for your previous feedback and insights. >> > > > >>>>>> Your contributions have added valuable perspectives to the >> > > > >>> discussions. >> > > > >>>>>> And we also benefit from the comparison of different >> solutions. >> > > > >>>>>> I’m also looking forward to seeing an initial version in >> > KIP-932, >> > > > >> as >> > > > >>> it >> > > > >>>>>> will provide a good reference for future implementations. >> > > > >>>>>> >> > > > >>>>>> Regarding your comment on AS2, I wanted to clarify that my >> > > > >>>> specification >> > > > >>>>>> references org.apache.kafka.common.Uuid. >> > > > >>>>>> I believe we’re referring to the same class, and it might >> just >> > be >> > > a >> > > > >>>> small >> > > > >>>>>> oversight due to the busy schedule. >> > > > >>>>>> >> > > > >>>>>> I want to express my gratitude once again for your many >> > insightful >> > > > >>>>>> comments, which have helped the discussion progress smoothly. >> > > > >>>>>> >> > > > >>>>>> Best regards, >> > > > >>>>>> TengYao >> > > > >>>>>> >> > > > >>>>>> >> > > > >>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 週四 >> > > > >>>> 下午11:28寫道: >> > > > >>>>>> >> > > > >>>>>>> Hi TengYao, >> > > > >>>>>>> I’ve been reading through the comments and I’m happy that >> the >> > > > >> lobby >> > > > >>>>>>> approach has not gained support. >> > > > >>>>>>> >> > > > >>>>>>> Assuming that this KIP is voted, I will be happy to change >> > > KIP-932 >> > > > >>> so >> > > > >>>>>>> that it only supports client-side member ID generation. >> Because >> > > > >> that >> > > > >>>> KIP >> > > > >>>>>>> is still >> > > > >>>>>>> under development, I can do this in the first version of >> > > > >>>>>>> ShareGroupHeartbeat. >> > > > >>>>>>> >> > > > >>>>>>> AS2: For the encoding section, I suppose the specific >> encoding >> > > > >> which >> > > > >>>>>>> is used is what org.apache.kafka.utils.Uuid uses. >> > > > >>>>>>> >> > > > >>>>>>> Thanks, >> > > > >>>>>>> Andrew >> > > > >>>>>>> >> > > > >>>>>>>> On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com >> > >> > > > >>> wrote: >> > > > >>>>>>>> >> > > > >>>>>>>> Hello Apoorv, >> > > > >>>>>>>> Thank you for your feedback. >> > > > >>>>>>>> Regarding the questions you raised, unfortunately, this KIP >> > > > >> cannot >> > > > >>>>>>>> guarantee the order of heartbeats. As with many classic >> > > > >>> distributed >> > > > >>>>>>> system >> > > > >>>>>>>> challenges, what we can do is make our best effort to >> ensure >> > > > >> that >> > > > >>>>> there >> > > > >>>>>>> are >> > > > >>>>>>>> no idle members or stale assignments under normal >> > circumstances. >> > > > >>>>>>>> >> > > > >>>>>>>> As for the lobby approach, I’m not a fan of it because it >> > > > >> requires >> > > > >>>>>>> adding a >> > > > >>>>>>>> mechanism to maintain client state within the >> ConsumerGroup, >> > > > >>> which, >> > > > >>>> in >> > > > >>>>>>> my >> > > > >>>>>>>> view, resembles something like a two-phase commit. This >> would >> > > > >>>>> introduce >> > > > >>>>>>>> more complexity than the proposal in this KIP, which is >> > > > >> something >> > > > >>> we >> > > > >>>>>>> want >> > > > >>>>>>>> to avoid. KIP-848 aims to simplify the existing protocol, >> and >> > > > >>> while >> > > > >>>>> the >> > > > >>>>>>>> lobby approach is a good one, I believe it is not the right >> > fit >> > > > >>> for >> > > > >>>>> this >> > > > >>>>>>>> particular situation. >> > > > >>>>>>>> >> > > > >>>>>>>> Best regards, >> > > > >>>>>>>> TengYao >> > > > >>>>>>>> >> > > > >>>>>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 >> 下午11:45寫道: >> > > > >>>>>>>> >> > > > >>>>>>>>> Hi David, >> > > > >>>>>>>>> >> > > > >>>>>>>>> I really appreciate your review and suggestions. As I am >> > still >> > > > >>>>> gaining >> > > > >>>>>>>>> experience in writing KIPs, your input has been incredibly >> > > > >>>> helpful. I >> > > > >>>>>>> am >> > > > >>>>>>>>> currently applying your suggestions to the KIP and will >> > > > >> complete >> > > > >>> it >> > > > >>>>> as >> > > > >>>>>>> soon >> > > > >>>>>>>>> as possible. >> > > > >>>>>>>>> Regarding the UUID part, I think we haven’t reached a >> > > > >> conclusion >> > > > >>>>>>> yet.(So >> > > > >>>>>>>>> far according to this thread) >> > > > >>>>>>>>> However, I will review the current implementation in the >> > Kafka >> > > > >>>> `Uuid` >> > > > >>>>>>>>> class and include a brief specification in the KIP. >> > > > >>>>>>>>> >> > > > >>>>>>>>> Once again, thank you so much for your help. >> > > > >>>>>>>>> >> > > > >>>>>>>>> Best regards, >> > > > >>>>>>>>> TengYao >> > > > >>>>>>>>> >> > > > >>>>>>>>> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 >> > 下午11:14寫道: >> > > > >>>>>>>>> >> > > > >>>>>>>>>> hi Apoorv >> > > > >>>>>>>>>> >> > > > >>>>>>>>>>> As the memberId is now known to the client, and client >> > might >> > > > >>> send >> > > > >>>>> the >> > > > >>>>>>>>>> leave >> > > > >>>>>>>>>> group heartbeat on shutdown prior to receiving the >> initial >> > > > >>>> heartbeat >> > > > >>>>>>>>>> response. If that's true then how do we guarantee that >> the 2 >> > > > >>>>> requests >> > > > >>>>>>> to >> > > > >>>>>>>>>> join and leave will be processed in order, which could >> still >> > > > >>> leave >> > > > >>>>>>> stale >> > > > >>>>>>>>>> members or throw unknown member id exceptions? >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> This is definitely a good question. the short answer: no >> > > > >>> guarantee >> > > > >>>>> but >> > > > >>>>>>>>>> best >> > > > >>>>>>>>>> efforts >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> Please notice the root cause is "we have no enough time >> to >> > > > >> wait >> > > > >>>>>>> member id >> > > > >>>>>>>>>> (response) when closing consumer". Sadly, we can' >> guarantee >> > > > >> the >> > > > >>>>>>> request >> > > > >>>>>>>>>> order due to the same reason. >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> However, in contrast to previous behavior, there is one >> big >> > > > >>>> benefit >> > > > >>>>>>> of new >> > > > >>>>>>>>>> approach - we can try STONITH because we know the member >> id >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> Best, >> > > > >>>>>>>>>> Chia-Ping >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三 >> > > > >>>> 下午8:55寫道: >> > > > >>>>>>>>>> >> > > > >>>>>>>>>>> Hi TengYao, >> > > > >>>>>>>>>>> Thanks for the KIP. Continuing on the point which Andrew >> > > > >>>> mentioned >> > > > >>>>> as >> > > > >>>>>>>>>> AS1. >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> As the memberId is now known to the client, and client >> > might >> > > > >>> send >> > > > >>>>> the >> > > > >>>>>>>>>> leave >> > > > >>>>>>>>>>> group heartbeat on shutdown prior to receiving the >> initial >> > > > >>>>> heartbeat >> > > > >>>>>>>>>>> response. If that's true then how do we guarantee that >> the >> > 2 >> > > > >>>>>>> requests to >> > > > >>>>>>>>>>> join and leave will be processed in order, which could >> > still >> > > > >>>> leave >> > > > >>>>>>> stale >> > > > >>>>>>>>>>> members or throw unknown member id exceptions? >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> Though the client side member id generation is helpful >> > which >> > > > >>> will >> > > > >>>>>>>>>> represent >> > > > >>>>>>>>>>> the same group perspective as from client and broker's >> end. >> > > > >>> But I >> > > > >>>>>>> think >> > > > >>>>>>>>>> the >> > > > >>>>>>>>>>> major concern we want to solve here is Stale Partition >> > > > >>>> Assignments >> > > > >>>>>>> which >> > > > >>>>>>>>>>> might still exist with the new approach. I am leaning >> > towards >> > > > >>> the >> > > > >>>>>>>>>>> suggestion mentioned by Andrew where partition >> assignment >> > > > >>>> triggers >> > > > >>>>> on >> > > > >>>>>>>>>>> subsequent heartbeat when client acknowledges the >> initial >> > > > >>>>> heartbeat, >> > > > >>>>>>>>>>> delayed partition assignment. >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> Though on a separate note, I have a different question. >> > What >> > > > >>>>> happens >> > > > >>>>>>>>>> when >> > > > >>>>>>>>>>> there is an issue with the client which sends the >> initial >> > > > >>>> heartbeat >> > > > >>>>>>>>>> without >> > > > >>>>>>>>>>> memberId, then crashes and restarts? I think we must be >> > > > >>>>> re-triggering >> > > > >>>>>>>>>>> assignments and expiring members only after the >> heartbeat >> > > > >>> session >> > > > >>>>>>>>>> timeout? >> > > > >>>>>>>>>>> If that's true then shall delayed partition assignment >> can >> > > > >> help >> > > > >>>>>>> benefit >> > > > >>>>>>>>>> us >> > > > >>>>>>>>>>> from this situation as well? >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> Regards, >> > > > >>>>>>>>>>> Apoorv Mittal >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot >> > > > >>>>>>>>>> <dja...@confluent.io.invalid> >> > > > >>>>>>>>>>> wrote: >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>>> Hi Andrew, >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> Personally, I don't like the lobby approach. It makes >> > things >> > > > >>>> more >> > > > >>>>>>>>>>>> complicated and it would require changing the records >> on >> > the >> > > > >>>>> server >> > > > >>>>>>>>>> too. >> > > > >>>>>>>>>>>> This is why I initially suggested the rejected >> alternative >> > > > >> #2 >> > > > >>>>> which >> > > > >>>>>>> is >> > > > >>>>>>>>>>>> pretty close but also not perfect. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> I'd like to clarify one thing. The >> ConsumerGroupHeartbeat >> > > > >> API >> > > > >>>>>>> already >> > > > >>>>>>>>>>>> supports generating the member id on the client so we >> > don't >> > > > >>> need >> > > > >>>>> any >> > > > >>>>>>>>>>>> conditional logic on the client side. This is actually >> > what >> > > > >> we >> > > > >>>>>>> wanted >> > > > >>>>>>>>>> to >> > > > >>>>>>>>>>> do >> > > > >>>>>>>>>>>> in the first place but the idea got pushed back by >> Magnus >> > > > >> back >> > > > >>>>> then >> > > > >>>>>>>>>>> because >> > > > >>>>>>>>>>>> generating uuid from librdkafka required a new >> dependency. >> > > > >> It >> > > > >>>>> turns >> > > > >>>>>>>>>> out >> > > > >>>>>>>>>>>> that librdkafka has that dependency today. In >> retrospect, >> > we >> > > > >>>>> should >> > > > >>>>>>>>>> have >> > > > >>>>>>>>>>>> pushed back on this. Long story short, we can just do >> it. >> > > > >> The >> > > > >>>>>>>>>> proposal in >> > > > >>>>>>>>>>>> this KIP is to make the member id required in future >> > > > >> versions. >> > > > >>>> We >> > > > >>>>>>>>>> could >> > > > >>>>>>>>>>>> also decide not to do it and to keep supporting both >> > > > >>>> approaches. I >> > > > >>>>>>>>>> would >> > > > >>>>>>>>>>>> also be fine with this. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> Best, >> > > > >>>>>>>>>>>> David >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield < >> > > > >>>>>>>>>>>> andrew_schofi...@live.com> >> > > > >>>>>>>>>>>> wrote: >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>>> Hi TengYao, >> > > > >>>>>>>>>>>>> Thanks for your response. I’ll have just one more try >> to >> > > > >>>>> persuade. >> > > > >>>>>>>>>>>>> I feel that I will need to follow the approach with >> > KIP-932 >> > > > >>>> when >> > > > >>>>>>>>>> we’ve >> > > > >>>>>>>>>>>>> made a decision, so I do have more than a passing >> > interest >> > > > >> in >> > > > >>>>> this. >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> A group member in the lobby is in the group, but it >> does >> > > > >> not >> > > > >>>> have >> > > > >>>>>>>>>> any >> > > > >>>>>>>>>>>>> assignments. A member of a consumer group can have no >> > > > >>> assigned >> > > > >>>>>>>>>>>>> partitions (such as 5 CG members subscribed to a topic >> > > > >> with 4 >> > > > >>>>>>>>>>>> partitions), >> > > > >>>>>>>>>>>>> so it’s a situation that consumer group members >> already >> > > > >>> expect. >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> One of Kafka’s strengths is the way that we handle API >> > > > >>>>> versioning. >> > > > >>>>>>>>>>>>> But, there is a cost - the behaviour is different >> > depending >> > > > >>> on >> > > > >>>>> the >> > > > >>>>>>>>>> RPC >> > > > >>>>>>>>>>>>> version. KIP-848 is on the cusp of completion, but >> we’re >> > > > >>>> already >> > > > >>>>>>>>>> adding >> > > > >>>>>>>>>>>>> conditional logic for v0/v1 for >> ConsumerGroupHeartbeat. >> > > > >>> That’s >> > > > >>>> a >> > > > >>>>>>>>>> pity. >> > > > >>>>>>>>>>>>> Only a minor issue, but it’s unfortunate. >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> Thanks, >> > > > >>>>>>>>>>>>> Andrew >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi < >> > > > >> kiting...@gmail.com> >> > > > >>>>>>>>>> wrote: >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> Hello Andrew >> > > > >>>>>>>>>>>>>> Thank you for your thoughtful suggestions and getting >> > the >> > > > >>>>>>>>>> discussion >> > > > >>>>>>>>>>>>> going. >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> To AS1: >> > > > >>>>>>>>>>>>>> In the current scenario where the server generates >> the >> > > > >> UUID, >> > > > >>>> if >> > > > >>>>>>>>>> the >> > > > >>>>>>>>>>>>> client >> > > > >>>>>>>>>>>>>> shuts down before receiving the memberId generated by >> > the >> > > > >> GC >> > > > >>>>>>>>>>>> (regardless >> > > > >>>>>>>>>>>>> of >> > > > >>>>>>>>>>>>>> whether it’s a graceful shutdown or not), the GC will >> > > > >> still >> > > > >>>> have >> > > > >>>>>>>>>> to >> > > > >>>>>>>>>>>> wait >> > > > >>>>>>>>>>>>>> for the heartbeat timeout because the client doesn’t >> > know >> > > > >>> its >> > > > >>>>>>>>>>> memberId. >> > > > >>>>>>>>>>>>>> This KIP indeed cannot completely resolve the >> > idempotency >> > > > >>>> issue, >> > > > >>>>>>>>>> but >> > > > >>>>>>>>>>> it >> > > > >>>>>>>>>>>>> can >> > > > >>>>>>>>>>>>>> better handle shutdown scenarios under normal >> > > > >> circumstances >> > > > >>>>>>>>>> because >> > > > >>>>>>>>>>> the >> > > > >>>>>>>>>>>>>> client always knows its memberId. Even if the client >> > shuts >> > > > >>>> down >> > > > >>>>>>>>>>>>> immediately >> > > > >>>>>>>>>>>>>> after the initial heartbeat, as long as it performs a >> > > > >>> graceful >> > > > >>>>>>>>>>> shutdown >> > > > >>>>>>>>>>>>> and >> > > > >>>>>>>>>>>>>> sends a leave heartbeat, the GC can manage the >> situation >> > > > >> and >> > > > >>>>>>>>>> remove >> > > > >>>>>>>>>>> the >> > > > >>>>>>>>>>>>>> member. Therefore, the goal of this KIP is to address >> > the >> > > > >>>> issue >> > > > >>>>>>>>>> where >> > > > >>>>>>>>>>>> the >> > > > >>>>>>>>>>>>>> GC has to wait for the heartbeat timeout due to the >> > client >> > > > >>>>> leaving >> > > > >>>>>>>>>>>>> without >> > > > >>>>>>>>>>>>>> knowing its memberId, which leads to reduced >> throughput >> > > > >> and >> > > > >>>>>>>>>> limited >> > > > >>>>>>>>>>>>>> scalability. >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> The solution you suggest has also been proposed by >> > David. >> > > > >>> The >> > > > >>>>>>>>>> concern >> > > > >>>>>>>>>>>>> with >> > > > >>>>>>>>>>>>>> this approach is that it introduces additional >> > complexity >> > > > >>> for >> > > > >>>>>>>>>>>>>> compatibility, as the new server would not >> immediately >> > add >> > > > >>> the >> > > > >>>>>>>>>> member >> > > > >>>>>>>>>>>> to >> > > > >>>>>>>>>>>>>> the group, while the old server would. This requires >> > > > >> clients >> > > > >>>> to >> > > > >>>>>>>>>>>>>> differentiate whether their memberId has been added >> to >> > the >> > > > >>>> group >> > > > >>>>>>>>>> or >> > > > >>>>>>>>>>>> not, >> > > > >>>>>>>>>>>>>> which could result in unexpected logs. >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> Best Regards, >> > > > >>>>>>>>>>>>>> TengYao >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 >> > 2024年8月14日 >> > > > >>> 週三 >> > > > >>>>>>>>>>>> 上午12:29寫道: >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> Hi TengYao, >> > > > >>>>>>>>>>>>>>> Thanks for the KIP. I wonder if there’s a different >> way >> > > > >> to >> > > > >>>>> close >> > > > >>>>>>>>>>> what >> > > > >>>>>>>>>>>>>>> is quite a small window. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> AS1: It is true that the initial heartbeat is not >> > > > >>> idempotent, >> > > > >>>>> but >> > > > >>>>>>>>>>> this >> > > > >>>>>>>>>>>>>>> remains >> > > > >>>>>>>>>>>>>>> true with this KIP. It’s just differently not >> > idempotent. >> > > > >>> If >> > > > >>>>> the >> > > > >>>>>>>>>>>> client >> > > > >>>>>>>>>>>>>>> makes its >> > > > >>>>>>>>>>>>>>> own member ID, sends a request and dies, the GC will >> > > > >> still >> > > > >>>> have >> > > > >>>>>>>>>>> added >> > > > >>>>>>>>>>>>>>> the member to the group and it will hang around >> until >> > the >> > > > >>>>> session >> > > > >>>>>>>>>>>>> expires. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> I wonder if the GC could still generate the member >> ID >> > in >> > > > >>>>>>>>>> response to >> > > > >>>>>>>>>>>> the >> > > > >>>>>>>>>>>>>>> first >> > > > >>>>>>>>>>>>>>> heartbeat, and put the member in a special PENDING >> > state >> > > > >>> with >> > > > >>>>> no >> > > > >>>>>>>>>>>>>>> assignments until the client sends the next >> heartbeat, >> > > > >> thus >> > > > >>>>>>>>>>> confirming >> > > > >>>>>>>>>>>>> it >> > > > >>>>>>>>>>>>>>> has received the member ID. This would not be a >> > protocol >> > > > >>>> change >> > > > >>>>>>>>>> at >> > > > >>>>>>>>>>>> all, >> > > > >>>>>>>>>>>>>>> just >> > > > >>>>>>>>>>>>>>> a change to the GC to keep a new member in the lobby >> > > > >> until >> > > > >>>> it’s >> > > > >>>>>>>>>>>>> comfirmed >> > > > >>>>>>>>>>>>>>> it knows its member ID. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> Thanks, >> > > > >>>>>>>>>>>>>>> Andrew >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi < >> > > > >>> kiting...@gmail.com> >> > > > >>>>>>>>>> wrote: >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> Hi Chia-Ping, >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> Thanks for review and suggestions. >> > > > >>>>>>>>>>>>>>>> I have updated the content of KIP accordingly. >> > > > >>>>>>>>>>>>>>>> Please take a look. >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> Best regards, >> > > > >>>>>>>>>>>>>>>> TengYao >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 >> 週二 >> > > > >>>>> 下午9:45寫道: >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> hi TengYao >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> thanks for this KIP. >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> 1) could you please describe the before/after >> > behavior >> > > > >> in >> > > > >>>> the >> > > > >>>>>>>>>>>>> "Proposed >> > > > >>>>>>>>>>>>>>>>> Changes" section? IIRC, current RPC allows HB >> having >> > > > >>> member >> > > > >>>>> id >> > > > >>>>>>>>>>>>>>> generated by >> > > > >>>>>>>>>>>>>>>>> client, right? If HB has no member ID, server will >> > > > >>> generate >> > > > >>>>> one >> > > > >>>>>>>>>>> and >> > > > >>>>>>>>>>>>> then >> > > > >>>>>>>>>>>>>>>>> return. The new behavior will enforce HB "must" >> have >> > > > >>> member >> > > > >>>>> ID. >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> 2) could you please write the version number >> > explicitly >> > > > >>> in >> > > > >>>>> the >> > > > >>>>>>>>>> KIP >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> 3) how new client code handle the old HB? Does it >> > > > >> always >> > > > >>>>>>>>>> generate >> > > > >>>>>>>>>>>>> member >> > > > >>>>>>>>>>>>>>>>> ID on client-side even though that is not >> restricted? >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> Best, >> > > > >>>>>>>>>>>>>>>>> Chia-Ping >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote: >> > > > >>>>>>>>>>>>>>>>>> Hello everyone, >> > > > >>>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>>> I would like to start a discussion thread on >> > KIP-1082, >> > > > >>>> which >> > > > >>>>>>>>>>>> proposes >> > > > >>>>>>>>>>>>>>>>>> enabling id generation for clients over the >> > > > >>>>>>>>>>> ConsumerGroupHeartbeat >> > > > >>>>>>>>>>>>> RPC. >> > > > >>>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>>> Here is the KIP Link: KIP-1082 >> > > > >>>>>>>>>>>>>>>>>> < >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>> >> > > > >>>>>>> >> > > > >>>>> >> > > > >>>> >> > > > >>> >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC >> > > > >>>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>>> Please take a look and let me know what you >> think, >> > > > >> and I >> > > > >>>>> would >> > > > >>>>>>>>>>>>>>> appreciate >> > > > >>>>>>>>>>>>>>>>>> any suggestions and feedback. >> > > > >>>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>>> Best regards, >> > > > >>>>>>>>>>>>>>>>>> TengYao >> > > > >>>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>> >> > > > >>>>>>>>> >> > > > >>>>>>> >> > > > >>>>>>> >> > > > >>>>> >> > > > >>>> >> > > > >>> >> > > > >> >> > > > >> > > > >> > > >> > >> >