hi TengYao KT2: +1 to second approach
Best, Chia-Ping David Jacot <dja...@confluent.io.invalid> 於 2024年8月30日 週五 下午9:15寫道: > Hi TengYao, > > KT2: I don't think that we can realistically validate the uuid on the > server. It is basically a string of chars. So I lean towards having a good > recommendation in the KIP and in the document of the field in the RPC's > definition. > > Best, > David > > On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi <kiting...@gmail.com> wrote: > > > Hello Kirk ! > > > > Thank you for your comments ! > > > > KT1: Yes, you are correct. The issue is not unique to the initial > > heartbeat; there can always be cases where the broker might lose > connection > > with a member. > > > > KT2: Currently, if the client doesn't have a member ID and the > memberEpoch > > equals 0, the coordinator will generate a UUID as the member ID for the > > client. However, at the RPC level, the member ID is sent as a literal > > string, meaning there are no restrictions on the format at this level. > > This also reminds me that we haven't reached a final conclusion on how to > > enforce the use of UUIDs. > > From our previous discussions, I recall two possible approaches: > > The first is to validate the UUID on the server side, and if it's not > > valid, throw an exception to the client. > > The second is to document a recommendation for clients to use UUIDs as > > member IDs, without strictly enforcing it. > > I think it's time to decide on the approach we want to take. > > > > KT3: Yes, "session" can be considered synonymous with "membership" in > this > > context. > > > > KT4: Thank you for pointing that out. I will update the wording to > > specifically say this behavior is for consumers. > > > > Thanks again for your comments. > > > > Best regards, > > TengYao > > > > Kirk True <k...@kirktrue.pro> 於 2024年8月30日 週五 上午12:39寫道: > > > > > Hi TengYao! > > > > > > Sorry for being late to the discussion... > > > > > > After reading the thread and then the KIP, I had a few > > questions/comments: > > > > > > KT1: In Motivation, it states: "This scenario can result in the broker > > > registering a new member for which it will never receive a proper leave > > > request.” Just to be clear, the broker will always have cases where it > > > might lose connection with a member. That’s not unique to the initial > > > heartbeat, right? > > > > > > KT2: There was a bit of back and forth about format of the member ID. > > From > > > what I gathered in the thread, the member ID is still defined in the > RPC > > as > > > a string and not a UUID, right? The KIP states that the “client must > > > generate a UUID as the member ID” and that the “server will validate > > that a > > > valid UUID is provided.” Is that a change for the server, or is it > > already > > > enforced as a UUID? > > > > > > KT3: Lianet mentioned some confusion over the use of the word > “session.” > > > Isn’t “session” synonymous with “membership?” > > > > > > KT4: Under “Member ID Lifecycle,” it states: "The client should reuse > the > > > same UUID as the member ID for all heartbeats and rejoin attempts to > > > maintain continuity within the group.” Could we change the first part > of > > > that to “The Consumer instance should…” We do have lifetimes that > extend > > > past the lifetime of a client instance (such as the transaction ID). > > > > > > Thanks, > > > Kirk > > > > > > > On Aug 29, 2024, at 1:28 AM, TengYao Chi <kiting...@gmail.com> > wrote: > > > > > > > > Hi David, > > > > > > > > Thank you for pointing that out. > > > > I have updated the content of the KIP based on Lianet's and your > > > feedback. > > > > Please take a look and let me know your thoughts. > > > > > > > > Best regards, > > > > TengYao > > > > > > > > David Jacot <dja...@confluent.io.invalid> 於 2024年8月29日 週四 下午3:20寫道: > > > > > > > >> Hi TengYao, > > > >> > > > >> Thanks for the update. I haven't fully read it yet but I will soon. > > > >> > > > >> LM4: This is incorrect. The consumer must keep its member id during > > its > > > >> entire lifetime (until the process stops or dies). The protocol > > > stipulates > > > >> that a member must rejoin with the same member id and the member > epoch > > > set > > > >> to zero when an FENCED_MEMBER_EPOCH occurs. This allows the member > to > > > >> resynchronize itself. We should not change this behavior. I think > that > > > we > > > >> should see the client side generation id as an incarnation id of the > > > >> application. It is generated once and kept until it stops or dies. > > > >> > > > >> Best, > > > >> David > > > >> > > > >> On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi <kiting...@gmail.com> > > > wrote: > > > >> > > > >>> Hello Lianet ! > > > >>> > > > >>> Thanks for the reviews and suggestions! > > > >>> > > > >>> LM1: Indeed, we plan to enforce client-side ID generation in the > > > future, > > > >>> and it is not an alternative. I will change the title accordingly. > > > >>> > > > >>> LM2: Yes, that's the expectation. I will add that statement to the > > > public > > > >>> interface section. > > > >>> > > > >>> LM3: Thank you for the high-level perspective review. I think > you're > > > >> right; > > > >>> our intention isn't very clear since it was placed at the end of > the > > > >>> section. I will try to rephrase that section to make it more > obvious. > > > >>> > > > >>> LM4: Regarding the definition of "session" in this KIP, I believe > it > > > >> refers > > > >>> to the period between the *first-time heartbeat* and when the > > *consumer > > > >>> leaves the group* (whether through a graceful shutdown or a > heartbeat > > > >>> timeout). The consumer should reuse its UUID if it has been > generated > > > >>> before. The only situation in which it will regenerate the UUID is > if > > > the > > > >>> coordinator finds that there is already a consumer with the same > > UUID. > > > >>> IIRC, the coordinator should compare the member epochs, and the > > > >>> later-joined consumer should be fenced off by the coordinator due > to > > > >> having > > > >>> a lower member epoch. Once the consumer receives a > > > `FENCED_MEMBER_EPOCH` > > > >>> error, it will generate a new UUID and attempt to rejoin. I will > > > clarify > > > >>> this in the KIP. > > > >>> > > > >>> Thanks again for your reviews, I really appreciate it. > > > >>> > > > >>> Best regards, > > > >>> TengYao > > > >>> > > > >>> Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 下午7:12寫道: > > > >>> > > > >>>> Hello TengYao! Thanks for taking on this issue, we've been going > > > around > > > >>> it > > > >>>> for a while. > > > >>>> > > > >>>> LM1: About the title of the KIP: "Enable ID Generation for Clients > > > over > > > >>> the > > > >>>> ConsumerGroupHeartbeat RPC". I find it confusing because it hints > > that > > > >>>> we're adding it as an alternative (which was discussed and > > discarded, > > > >> in > > > >>>> favour of really enforcing it). It's also missing the core change > > imo, > > > >>>> which is "where" the generation happens. So, maybe more to the > point > > > >> with > > > >>>> something along the lines of "Client-side generated ID for clients > > > over > > > >>>> ConsumerGroupHeartbeat RPC"? > > > >>>> > > > >>>> LM2: On the public interfaces section, the KIP states that "the > > server > > > >>> will > > > >>>> reject the request", but we should agree on the specific error > > type. I > > > >>>> expect it should fail with an InvalidRequestException, is that the > > > >>>> intention? (This was also suggested in the discussion thread > before > > > but > > > >>> is > > > >>>> not in the KIP). > > > >>>> > > > >>>> LM3. Related to my previous point, I find that to be the true > > > >>> public-facing > > > >>>> change (member ID mandatory at the protocol level), but it's only > at > > > >> the > > > >>>> end of the Public interfaces changes, kind of lost among details > of > > > how > > > >>>> we're going to do it. Should we rephrase that section with the > > actual > > > >>>> change first, and the hows after (ex. Bumping the version is not > the > > > >>>> public-facing change in this case, it's just the mechanism to > > properly > > > >>>> introduce our change) > > > >>>> > > > >>>> LM4. Regarding the lifetime of the UUID: the KIP states we will > > > "Verify > > > >>>> that the UUID remains consistent across all subsequent heartbeats > > > >> during > > > >>>> the session". What is this "session" referring to here? I would > > expect > > > >>> that > > > >>>> the UUID is associated to a consumer instance (generated for the > > > >> consumer > > > >>>> the first time it needs to send a HB if it doesn't have the UUID > > yet. > > > >>> From > > > >>>> there on, every time it needs to send a "first HB" again, it will > > > reuse > > > >>> its > > > >>>> UUID, is that the intention? Note that we should consider that the > > > same > > > >>>> consumer instance may have many "first heartbeats", meaning > > heartbeats > > > >> to > > > >>>> join the group when it's not part of it (ex. consumer unsubscribe > + > > > >>>> subscribe, fenced, stale). Is this the intention or are you > > > considering > > > >>> the > > > >>>> lifetime differently? We should clarify it in the KIP. > > > >>>> > > > >>>> Thanks! > > > >>>> > > > >>>> Lianet > > > >>>> > > > >>>> On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi <kiting...@gmail.com> > > > >> wrote: > > > >>>> > > > >>>>> Hi everyone, > > > >>>>> > > > >>>>> I have revised this KIP multiple times based on the feedback from > > our > > > >>>>> discussions. > > > >>>>> I would greatly appreciate it if you could review it when you > have > > > >> the > > > >>>>> time. > > > >>>>> If there are no further comments or suggestions, I plan to > proceed > > > >> with > > > >>>>> initiating a vote soon. > > > >>>>> > > > >>>>> Best regards, > > > >>>>> TengYao > > > >>>>> > > > >>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道: > > > >>>>> > > > >>>>>> Hi Andrew, > > > >>>>>> Thank you for your previous feedback and insights. > > > >>>>>> Your contributions have added valuable perspectives to the > > > >>> discussions. > > > >>>>>> And we also benefit from the comparison of different solutions. > > > >>>>>> I’m also looking forward to seeing an initial version in > KIP-932, > > > >> as > > > >>> it > > > >>>>>> will provide a good reference for future implementations. > > > >>>>>> > > > >>>>>> Regarding your comment on AS2, I wanted to clarify that my > > > >>>> specification > > > >>>>>> references org.apache.kafka.common.Uuid. > > > >>>>>> I believe we’re referring to the same class, and it might just > be > > a > > > >>>> small > > > >>>>>> oversight due to the busy schedule. > > > >>>>>> > > > >>>>>> I want to express my gratitude once again for your many > insightful > > > >>>>>> comments, which have helped the discussion progress smoothly. > > > >>>>>> > > > >>>>>> Best regards, > > > >>>>>> TengYao > > > >>>>>> > > > >>>>>> > > > >>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 週四 > > > >>>> 下午11:28寫道: > > > >>>>>> > > > >>>>>>> Hi TengYao, > > > >>>>>>> I’ve been reading through the comments and I’m happy that the > > > >> lobby > > > >>>>>>> approach has not gained support. > > > >>>>>>> > > > >>>>>>> Assuming that this KIP is voted, I will be happy to change > > KIP-932 > > > >>> so > > > >>>>>>> that it only supports client-side member ID generation. Because > > > >> that > > > >>>> KIP > > > >>>>>>> is still > > > >>>>>>> under development, I can do this in the first version of > > > >>>>>>> ShareGroupHeartbeat. > > > >>>>>>> > > > >>>>>>> AS2: For the encoding section, I suppose the specific encoding > > > >> which > > > >>>>>>> is used is what org.apache.kafka.utils.Uuid uses. > > > >>>>>>> > > > >>>>>>> Thanks, > > > >>>>>>> Andrew > > > >>>>>>> > > > >>>>>>>> On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com> > > > >>> wrote: > > > >>>>>>>> > > > >>>>>>>> Hello Apoorv, > > > >>>>>>>> Thank you for your feedback. > > > >>>>>>>> Regarding the questions you raised, unfortunately, this KIP > > > >> cannot > > > >>>>>>>> guarantee the order of heartbeats. As with many classic > > > >>> distributed > > > >>>>>>> system > > > >>>>>>>> challenges, what we can do is make our best effort to ensure > > > >> that > > > >>>>> there > > > >>>>>>> are > > > >>>>>>>> no idle members or stale assignments under normal > circumstances. > > > >>>>>>>> > > > >>>>>>>> As for the lobby approach, I’m not a fan of it because it > > > >> requires > > > >>>>>>> adding a > > > >>>>>>>> mechanism to maintain client state within the ConsumerGroup, > > > >>> which, > > > >>>> in > > > >>>>>>> my > > > >>>>>>>> view, resembles something like a two-phase commit. This would > > > >>>>> introduce > > > >>>>>>>> more complexity than the proposal in this KIP, which is > > > >> something > > > >>> we > > > >>>>>>> want > > > >>>>>>>> to avoid. KIP-848 aims to simplify the existing protocol, and > > > >>> while > > > >>>>> the > > > >>>>>>>> lobby approach is a good one, I believe it is not the right > fit > > > >>> for > > > >>>>> this > > > >>>>>>>> particular situation. > > > >>>>>>>> > > > >>>>>>>> Best regards, > > > >>>>>>>> TengYao > > > >>>>>>>> > > > >>>>>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道: > > > >>>>>>>> > > > >>>>>>>>> Hi David, > > > >>>>>>>>> > > > >>>>>>>>> I really appreciate your review and suggestions. As I am > still > > > >>>>> gaining > > > >>>>>>>>> experience in writing KIPs, your input has been incredibly > > > >>>> helpful. I > > > >>>>>>> am > > > >>>>>>>>> currently applying your suggestions to the KIP and will > > > >> complete > > > >>> it > > > >>>>> as > > > >>>>>>> soon > > > >>>>>>>>> as possible. > > > >>>>>>>>> Regarding the UUID part, I think we haven’t reached a > > > >> conclusion > > > >>>>>>> yet.(So > > > >>>>>>>>> far according to this thread) > > > >>>>>>>>> However, I will review the current implementation in the > Kafka > > > >>>> `Uuid` > > > >>>>>>>>> class and include a brief specification in the KIP. > > > >>>>>>>>> > > > >>>>>>>>> Once again, thank you so much for your help. > > > >>>>>>>>> > > > >>>>>>>>> Best regards, > > > >>>>>>>>> TengYao > > > >>>>>>>>> > > > >>>>>>>>> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 > 下午11:14寫道: > > > >>>>>>>>> > > > >>>>>>>>>> hi Apoorv > > > >>>>>>>>>> > > > >>>>>>>>>>> As the memberId is now known to the client, and client > might > > > >>> send > > > >>>>> the > > > >>>>>>>>>> leave > > > >>>>>>>>>> group heartbeat on shutdown prior to receiving the initial > > > >>>> heartbeat > > > >>>>>>>>>> response. If that's true then how do we guarantee that the 2 > > > >>>>> requests > > > >>>>>>> to > > > >>>>>>>>>> join and leave will be processed in order, which could still > > > >>> leave > > > >>>>>>> stale > > > >>>>>>>>>> members or throw unknown member id exceptions? > > > >>>>>>>>>> > > > >>>>>>>>>> This is definitely a good question. the short answer: no > > > >>> guarantee > > > >>>>> but > > > >>>>>>>>>> best > > > >>>>>>>>>> efforts > > > >>>>>>>>>> > > > >>>>>>>>>> Please notice the root cause is "we have no enough time to > > > >> wait > > > >>>>>>> member id > > > >>>>>>>>>> (response) when closing consumer". Sadly, we can' guarantee > > > >> the > > > >>>>>>> request > > > >>>>>>>>>> order due to the same reason. > > > >>>>>>>>>> > > > >>>>>>>>>> However, in contrast to previous behavior, there is one big > > > >>>> benefit > > > >>>>>>> of new > > > >>>>>>>>>> approach - we can try STONITH because we know the member id > > > >>>>>>>>>> > > > >>>>>>>>>> Best, > > > >>>>>>>>>> Chia-Ping > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三 > > > >>>> 下午8:55寫道: > > > >>>>>>>>>> > > > >>>>>>>>>>> Hi TengYao, > > > >>>>>>>>>>> Thanks for the KIP. Continuing on the point which Andrew > > > >>>> mentioned > > > >>>>> as > > > >>>>>>>>>> AS1. > > > >>>>>>>>>>> > > > >>>>>>>>>>> As the memberId is now known to the client, and client > might > > > >>> send > > > >>>>> the > > > >>>>>>>>>> leave > > > >>>>>>>>>>> group heartbeat on shutdown prior to receiving the initial > > > >>>>> heartbeat > > > >>>>>>>>>>> response. If that's true then how do we guarantee that the > 2 > > > >>>>>>> requests to > > > >>>>>>>>>>> join and leave will be processed in order, which could > still > > > >>>> leave > > > >>>>>>> stale > > > >>>>>>>>>>> members or throw unknown member id exceptions? > > > >>>>>>>>>>> > > > >>>>>>>>>>> Though the client side member id generation is helpful > which > > > >>> will > > > >>>>>>>>>> represent > > > >>>>>>>>>>> the same group perspective as from client and broker's end. > > > >>> But I > > > >>>>>>> think > > > >>>>>>>>>> the > > > >>>>>>>>>>> major concern we want to solve here is Stale Partition > > > >>>> Assignments > > > >>>>>>> which > > > >>>>>>>>>>> might still exist with the new approach. I am leaning > towards > > > >>> the > > > >>>>>>>>>>> suggestion mentioned by Andrew where partition assignment > > > >>>> triggers > > > >>>>> on > > > >>>>>>>>>>> subsequent heartbeat when client acknowledges the initial > > > >>>>> heartbeat, > > > >>>>>>>>>>> delayed partition assignment. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Though on a separate note, I have a different question. > What > > > >>>>> happens > > > >>>>>>>>>> when > > > >>>>>>>>>>> there is an issue with the client which sends the initial > > > >>>> heartbeat > > > >>>>>>>>>> without > > > >>>>>>>>>>> memberId, then crashes and restarts? I think we must be > > > >>>>> re-triggering > > > >>>>>>>>>>> assignments and expiring members only after the heartbeat > > > >>> session > > > >>>>>>>>>> timeout? > > > >>>>>>>>>>> If that's true then shall delayed partition assignment can > > > >> help > > > >>>>>>> benefit > > > >>>>>>>>>> us > > > >>>>>>>>>>> from this situation as well? > > > >>>>>>>>>>> > > > >>>>>>>>>>> Regards, > > > >>>>>>>>>>> Apoorv Mittal > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot > > > >>>>>>>>>> <dja...@confluent.io.invalid> > > > >>>>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> Hi Andrew, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Personally, I don't like the lobby approach. It makes > things > > > >>>> more > > > >>>>>>>>>>>> complicated and it would require changing the records on > the > > > >>>>> server > > > >>>>>>>>>> too. > > > >>>>>>>>>>>> This is why I initially suggested the rejected alternative > > > >> #2 > > > >>>>> which > > > >>>>>>> is > > > >>>>>>>>>>>> pretty close but also not perfect. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat > > > >> API > > > >>>>>>> already > > > >>>>>>>>>>>> supports generating the member id on the client so we > don't > > > >>> need > > > >>>>> any > > > >>>>>>>>>>>> conditional logic on the client side. This is actually > what > > > >> we > > > >>>>>>> wanted > > > >>>>>>>>>> to > > > >>>>>>>>>>> do > > > >>>>>>>>>>>> in the first place but the idea got pushed back by Magnus > > > >> back > > > >>>>> then > > > >>>>>>>>>>> because > > > >>>>>>>>>>>> generating uuid from librdkafka required a new dependency. > > > >> It > > > >>>>> turns > > > >>>>>>>>>> out > > > >>>>>>>>>>>> that librdkafka has that dependency today. In retrospect, > we > > > >>>>> should > > > >>>>>>>>>> have > > > >>>>>>>>>>>> pushed back on this. Long story short, we can just do it. > > > >> The > > > >>>>>>>>>> proposal in > > > >>>>>>>>>>>> this KIP is to make the member id required in future > > > >> versions. > > > >>>> We > > > >>>>>>>>>> could > > > >>>>>>>>>>>> also decide not to do it and to keep supporting both > > > >>>> approaches. I > > > >>>>>>>>>> would > > > >>>>>>>>>>>> also be fine with this. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Best, > > > >>>>>>>>>>>> David > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield < > > > >>>>>>>>>>>> andrew_schofi...@live.com> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> Hi TengYao, > > > >>>>>>>>>>>>> Thanks for your response. I’ll have just one more try to > > > >>>>> persuade. > > > >>>>>>>>>>>>> I feel that I will need to follow the approach with > KIP-932 > > > >>>> when > > > >>>>>>>>>> we’ve > > > >>>>>>>>>>>>> made a decision, so I do have more than a passing > interest > > > >> in > > > >>>>> this. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> A group member in the lobby is in the group, but it does > > > >> not > > > >>>> have > > > >>>>>>>>>> any > > > >>>>>>>>>>>>> assignments. A member of a consumer group can have no > > > >>> assigned > > > >>>>>>>>>>>>> partitions (such as 5 CG members subscribed to a topic > > > >> with 4 > > > >>>>>>>>>>>> partitions), > > > >>>>>>>>>>>>> so it’s a situation that consumer group members already > > > >>> expect. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> One of Kafka’s strengths is the way that we handle API > > > >>>>> versioning. > > > >>>>>>>>>>>>> But, there is a cost - the behaviour is different > depending > > > >>> on > > > >>>>> the > > > >>>>>>>>>> RPC > > > >>>>>>>>>>>>> version. KIP-848 is on the cusp of completion, but we’re > > > >>>> already > > > >>>>>>>>>> adding > > > >>>>>>>>>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat. > > > >>> That’s > > > >>>> a > > > >>>>>>>>>> pity. > > > >>>>>>>>>>>>> Only a minor issue, but it’s unfortunate. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>>> Andrew > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi < > > > >> kiting...@gmail.com> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Hello Andrew > > > >>>>>>>>>>>>>> Thank you for your thoughtful suggestions and getting > the > > > >>>>>>>>>> discussion > > > >>>>>>>>>>>>> going. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> To AS1: > > > >>>>>>>>>>>>>> In the current scenario where the server generates the > > > >> UUID, > > > >>>> if > > > >>>>>>>>>> the > > > >>>>>>>>>>>>> client > > > >>>>>>>>>>>>>> shuts down before receiving the memberId generated by > the > > > >> GC > > > >>>>>>>>>>>> (regardless > > > >>>>>>>>>>>>> of > > > >>>>>>>>>>>>>> whether it’s a graceful shutdown or not), the GC will > > > >> still > > > >>>> have > > > >>>>>>>>>> to > > > >>>>>>>>>>>> wait > > > >>>>>>>>>>>>>> for the heartbeat timeout because the client doesn’t > know > > > >>> its > > > >>>>>>>>>>> memberId. > > > >>>>>>>>>>>>>> This KIP indeed cannot completely resolve the > idempotency > > > >>>> issue, > > > >>>>>>>>>> but > > > >>>>>>>>>>> it > > > >>>>>>>>>>>>> can > > > >>>>>>>>>>>>>> better handle shutdown scenarios under normal > > > >> circumstances > > > >>>>>>>>>> because > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>> client always knows its memberId. Even if the client > shuts > > > >>>> down > > > >>>>>>>>>>>>> immediately > > > >>>>>>>>>>>>>> after the initial heartbeat, as long as it performs a > > > >>> graceful > > > >>>>>>>>>>> shutdown > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>> sends a leave heartbeat, the GC can manage the situation > > > >> and > > > >>>>>>>>>> remove > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>> member. Therefore, the goal of this KIP is to address > the > > > >>>> issue > > > >>>>>>>>>> where > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>> GC has to wait for the heartbeat timeout due to the > client > > > >>>>> leaving > > > >>>>>>>>>>>>> without > > > >>>>>>>>>>>>>> knowing its memberId, which leads to reduced throughput > > > >> and > > > >>>>>>>>>> limited > > > >>>>>>>>>>>>>> scalability. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> The solution you suggest has also been proposed by > David. > > > >>> The > > > >>>>>>>>>> concern > > > >>>>>>>>>>>>> with > > > >>>>>>>>>>>>>> this approach is that it introduces additional > complexity > > > >>> for > > > >>>>>>>>>>>>>> compatibility, as the new server would not immediately > add > > > >>> the > > > >>>>>>>>>> member > > > >>>>>>>>>>>> to > > > >>>>>>>>>>>>>> the group, while the old server would. This requires > > > >> clients > > > >>>> to > > > >>>>>>>>>>>>>> differentiate whether their memberId has been added to > the > > > >>>> group > > > >>>>>>>>>> or > > > >>>>>>>>>>>> not, > > > >>>>>>>>>>>>>> which could result in unexpected logs. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Best Regards, > > > >>>>>>>>>>>>>> TengYao > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 > 2024年8月14日 > > > >>> 週三 > > > >>>>>>>>>>>> 上午12:29寫道: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Hi TengYao, > > > >>>>>>>>>>>>>>> Thanks for the KIP. I wonder if there’s a different way > > > >> to > > > >>>>> close > > > >>>>>>>>>>> what > > > >>>>>>>>>>>>>>> is quite a small window. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> AS1: It is true that the initial heartbeat is not > > > >>> idempotent, > > > >>>>> but > > > >>>>>>>>>>> this > > > >>>>>>>>>>>>>>> remains > > > >>>>>>>>>>>>>>> true with this KIP. It’s just differently not > idempotent. > > > >>> If > > > >>>>> the > > > >>>>>>>>>>>> client > > > >>>>>>>>>>>>>>> makes its > > > >>>>>>>>>>>>>>> own member ID, sends a request and dies, the GC will > > > >> still > > > >>>> have > > > >>>>>>>>>>> added > > > >>>>>>>>>>>>>>> the member to the group and it will hang around until > the > > > >>>>> session > > > >>>>>>>>>>>>> expires. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> I wonder if the GC could still generate the member ID > in > > > >>>>>>>>>> response to > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>>> first > > > >>>>>>>>>>>>>>> heartbeat, and put the member in a special PENDING > state > > > >>> with > > > >>>>> no > > > >>>>>>>>>>>>>>> assignments until the client sends the next heartbeat, > > > >> thus > > > >>>>>>>>>>> confirming > > > >>>>>>>>>>>>> it > > > >>>>>>>>>>>>>>> has received the member ID. This would not be a > protocol > > > >>>> change > > > >>>>>>>>>> at > > > >>>>>>>>>>>> all, > > > >>>>>>>>>>>>>>> just > > > >>>>>>>>>>>>>>> a change to the GC to keep a new member in the lobby > > > >> until > > > >>>> it’s > > > >>>>>>>>>>>>> comfirmed > > > >>>>>>>>>>>>>>> it knows its member ID. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>>>>> Andrew > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi < > > > >>> kiting...@gmail.com> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Hi Chia-Ping, > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Thanks for review and suggestions. > > > >>>>>>>>>>>>>>>> I have updated the content of KIP accordingly. > > > >>>>>>>>>>>>>>>> Please take a look. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Best regards, > > > >>>>>>>>>>>>>>>> TengYao > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二 > > > >>>>> 下午9:45寫道: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> hi TengYao > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> thanks for this KIP. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> 1) could you please describe the before/after > behavior > > > >> in > > > >>>> the > > > >>>>>>>>>>>>> "Proposed > > > >>>>>>>>>>>>>>>>> Changes" section? IIRC, current RPC allows HB having > > > >>> member > > > >>>>> id > > > >>>>>>>>>>>>>>> generated by > > > >>>>>>>>>>>>>>>>> client, right? If HB has no member ID, server will > > > >>> generate > > > >>>>> one > > > >>>>>>>>>>> and > > > >>>>>>>>>>>>> then > > > >>>>>>>>>>>>>>>>> return. The new behavior will enforce HB "must" have > > > >>> member > > > >>>>> ID. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> 2) could you please write the version number > explicitly > > > >>> in > > > >>>>> the > > > >>>>>>>>>> KIP > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> 3) how new client code handle the old HB? Does it > > > >> always > > > >>>>>>>>>> generate > > > >>>>>>>>>>>>> member > > > >>>>>>>>>>>>>>>>> ID on client-side even though that is not restricted? > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>>>> Chia-Ping > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote: > > > >>>>>>>>>>>>>>>>>> Hello everyone, > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> I would like to start a discussion thread on > KIP-1082, > > > >>>> which > > > >>>>>>>>>>>> proposes > > > >>>>>>>>>>>>>>>>>> enabling id generation for clients over the > > > >>>>>>>>>>> ConsumerGroupHeartbeat > > > >>>>>>>>>>>>> RPC. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Here is the KIP Link: KIP-1082 > > > >>>>>>>>>>>>>>>>>> < > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Please take a look and let me know what you think, > > > >> and I > > > >>>>> would > > > >>>>>>>>>>>>>>> appreciate > > > >>>>>>>>>>>>>>>>>> any suggestions and feedback. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Best regards, > > > >>>>>>>>>>>>>>>>>> TengYao > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > > > >