Hi Chia-Ping, > Checks whether the member epoch matches the member epoch in its current assignment. FENCED_MEMBER_EPOCH is returned otherwise. The member is also removed from the group.
We did not implement it like this in the end. We keep the member and only request him to re-synchronise himself by returning the FENCED_MEMBER_EPOCH error. We can remove it from the KIP. > Given that member id is the incarnation of consumer, it is hard to detect the scenario of "two consumers have the same member id" (it may be caused by the lost response). I guess the best approach is the later -created consumer should be idle with fenced error, and we should add this to the KIP (maybe we should revise the error message carried by `FencedMemberEpochException`). BUT, we need to rethink the approach if above description "The member is also removed from the group" is true I don't think that we have a reliable way to detect whether two members end up using the same member id. I don't think that it is possible if we generate the member id on the client side. By the way, we don't handle this case either on the server side today. We blindly generate a member id and use it. However, we could do better on the server side. For instance, we could check whether the member id is already used and we could regenerate another one if it is. However, I tend to believe that the risk of collision within a group is neglectable. Best, David On Fri, Sep 20, 2024 at 8:11 PM Chia-Ping Tsai <chia7...@gmail.com> wrote: > wait a minute. There is a description about fenced error in the KIP-848 > > > Checks whether the member epoch matches the member epoch in its current > assignment. FENCED_MEMBER_EPOCH is returned otherwise. The member is also > removed from the group. > > I failed to find the code of implementing "The member is also removed from > the group.". I will appreciate it if someone can share the code link to me. > > Let's assume it will be implemented eventually, and I feel that will cause > a serious issue as the consumers having the same member id will remove the > existing member and then create itself member? > > > Once the client receives this exception, it should consider the exception > as a serious error and stop the process, and the user will need to assign a > new memberId. > > Given that member id is the incarnation of consumer, it is hard to detect > the scenario of "two consumers have the same member id" (it may be caused > by the lost response). I guess the best approach is the later -created > consumer should be idle with fenced error, and we should add this to the > KIP (maybe we should revise the error message carried by > `FencedMemberEpochException`). BUT, we need to rethink the approach if > above description "The member is also removed from the group" is true > > Best, > Chia-Ping > > > > TengYao Chi <kiting...@gmail.com> 於 2024年9月20日 週五 下午8:17寫道: > > > Hi Chia-Ping, > > > > Thanks for pointing out this issue. > > > > I’m thinking that maybe we might need to define a new Exception to handle > > this scenario. > > Once the client receives this exception, it should consider the exception > > as a serious error and stop the process, and the user will need to > assign a > > new memberId. > > What do you think? > > > > Best Regards, > > TengYao > > > > Chia-Ping Tsai <chia7...@apache.org> 於 2024年9月20日 週五 下午7:00寫道: > > > > > > This part is not clear either. It basically says that if a member > joins > > > with an existing member id but a different epoch, it will be fenced. > Then > > > it must rejoin with the same member id and epoch zero. This is already > > the > > > current behavior and it does not help with detecting duplicates, right? > > > > > > The duplicates issue is interesting. The member id is generated on > > > server-side by UUID before, so it seems to me the fenced member epoch > > > happens only if the client miss the response with bumped epoch, and > this > > > case is recoverable. > > > > > > however, v1 brings a different fenced epoch scenario. Users, now, have > > > responsibility to generate unique member id, hence clients may > encounter > > > infinite fenced error if there are >1 clients having same member id due > > to > > > config error (or other engineer error) > > > > > > Maybe we can highlight this scenario if we start to requires clients to > > > generate unique member id. > > > > > > Best, > > > Chia-Ping > > > > > > On 2024/09/19 18:36:48 David Jacot wrote: > > > > Hi, > > > > > > > > Thanks for the update. I have a few nits: > > > > > > > > > If the member ID is null or empty, the server will reject the > request > > > > with an InvalidRequestException. > > > > We should clarify that this should only apply to version >= 1. > > > > > > > > > The consumer instance must generate a member ID, and this ID should > > > > remain consistent for the duration of the consumer's session. Here, a > > > > "session" is defined as the period from the consumer's first > heartbeat > > > > until it leaves the group, either through a graceful shutdown, a > > > heartbeat > > > > timeout, or the process stopping or dying. The consumer instance > should > > > > reuse the same member ID for all heartbeats and rejoin attempts to > > > maintain > > > > continuity within the group. > > > > > > > > This part is not clear to me. When the member leaves the group, it > > should > > > > not reset the member id. I would rather say that the member must > > generate > > > > its member id when it starts and it must keep it until the process > > stops. > > > > It is basically an incarnation of the process. > > > > > > > > > If a conflict arises where the member ID generated by the client is > > > > detected to be a duplicate within the same group (for example, the > same > > > > member ID is associated with another active member in the group), the > > > > server will handle this by comparing the memberEpoch values of the > > > > conflicting members. The member with the lower memberEpoch is > > considered > > > > outdated and will be fenced off by the server. When this occurs, the > > > server > > > > responds with a FENCED_MEMBER_EPOCH error to the client, signaling it > > to > > > > rejoin the group with the same member ID while resetting the > > memberEpoch > > > to > > > > zero. This ensures that the client properly resynchronizes and > > maintains > > > > the continuity and consistency of the group membership. > > > > > > > > This part is not clear either. It basically says that if a member > joins > > > > with an existing member id but a different epoch, it will be fenced. > > Then > > > > it must rejoin with the same member id and epoch zero. This is > already > > > the > > > > current behavior and it does not help with detecting duplicates, > right? > > > > Should we just remove the paragraph? > > > > > > > > > A member ID mismatch occurs within a session: If the server > detects a > > > > mismatch between the provided member ID and the expected member ID > for > > an > > > > ongoing session, it should return a UNKNOWN_MEMBER_ID error. > > > > > > > > How could we detect a mismatch between the provided and the expected > > > member > > > > id? My understanding is that we can only know whether the provided > > member > > > > id exists or not. This is already implemented. > > > > > > > > Thanks, > > > > David > > > > > > > > On Sat, Sep 14, 2024 at 9:31 AM TengYao Chi <kiting...@gmail.com> > > wrote: > > > > > > > > > Hello everyone, > > > > > > > > > > Since this KIP has been fully discussed, I will initiate a vote for > > it > > > next > > > > > Monday. > > > > > Thank you and have a nice weekend. > > > > > > > > > > Best regards, > > > > > TengYao > > > > > > > > > > TengYao Chi <kiting...@gmail.com> 於 2024年9月5日 週四 下午2:19寫道: > > > > > > > > > > > Hello everyone, > > > > > > > > > > > > KT2: It looks like everyone who has expressed an opinion supports > > the > > > > > > second option: “Document a recommendation for clients to use > UUIDs > > as > > > > > > member IDs, without strictly enforcing it.” > > > > > > I have updated the KIP accordingly. > > > > > > Please take a look, and let me know if you have any thoughts or > > > feedback. > > > > > > > > > > > > Thank you! > > > > > > > > > > > > Best regards, > > > > > > TengYao > > > > > > > > > > > > Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月30日 週五 下午9:56寫道: > > > > > > > > > > > >> hi TengYao > > > > > >> > > > > > >> KT2: +1 to second approach > > > > > >> > > > > > >> Best, > > > > > >> Chia-Ping > > > > > >> > > > > > >> > > > > > >> David Jacot <dja...@confluent.io.invalid> 於 2024年8月30日 週五 > > 下午9:15寫道: > > > > > >> > > > > > >> > Hi TengYao, > > > > > >> > > > > > > >> > KT2: I don't think that we can realistically validate the uuid > > on > > > the > > > > > >> > server. It is basically a string of chars. So I lean towards > > > having a > > > > > >> good > > > > > >> > recommendation in the KIP and in the document of the field in > > the > > > > > RPC's > > > > > >> > definition. > > > > > >> > > > > > > >> > Best, > > > > > >> > David > > > > > >> > > > > > > >> > On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi < > > kiting...@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > > >> > > Hello Kirk ! > > > > > >> > > > > > > > >> > > Thank you for your comments ! > > > > > >> > > > > > > > >> > > KT1: Yes, you are correct. The issue is not unique to the > > > initial > > > > > >> > > heartbeat; there can always be cases where the broker might > > lose > > > > > >> > connection > > > > > >> > > with a member. > > > > > >> > > > > > > > >> > > KT2: Currently, if the client doesn't have a member ID and > the > > > > > >> > memberEpoch > > > > > >> > > equals 0, the coordinator will generate a UUID as the member > > ID > > > for > > > > > >> the > > > > > >> > > client. However, at the RPC level, the member ID is sent as > a > > > > > literal > > > > > >> > > string, meaning there are no restrictions on the format at > > this > > > > > level. > > > > > >> > > This also reminds me that we haven't reached a final > > conclusion > > > on > > > > > >> how to > > > > > >> > > enforce the use of UUIDs. > > > > > >> > > From our previous discussions, I recall two possible > > approaches: > > > > > >> > > The first is to validate the UUID on the server side, and if > > > it's > > > > > not > > > > > >> > > valid, throw an exception to the client. > > > > > >> > > The second is to document a recommendation for clients to > use > > > UUIDs > > > > > as > > > > > >> > > member IDs, without strictly enforcing it. > > > > > >> > > I think it's time to decide on the approach we want to take. > > > > > >> > > > > > > > >> > > KT3: Yes, "session" can be considered synonymous with > > > "membership" > > > > > in > > > > > >> > this > > > > > >> > > context. > > > > > >> > > > > > > > >> > > KT4: Thank you for pointing that out. I will update the > > wording > > > to > > > > > >> > > specifically say this behavior is for consumers. > > > > > >> > > > > > > > >> > > Thanks again for your comments. > > > > > >> > > > > > > > >> > > Best regards, > > > > > >> > > TengYao > > > > > >> > > > > > > > >> > > Kirk True <k...@kirktrue.pro> 於 2024年8月30日 週五 上午12:39寫道: > > > > > >> > > > > > > > >> > > > Hi TengYao! > > > > > >> > > > > > > > > >> > > > Sorry for being late to the discussion... > > > > > >> > > > > > > > > >> > > > After reading the thread and then the KIP, I had a few > > > > > >> > > questions/comments: > > > > > >> > > > > > > > > >> > > > KT1: In Motivation, it states: "This scenario can result > in > > > the > > > > > >> broker > > > > > >> > > > registering a new member for which it will never receive a > > > proper > > > > > >> leave > > > > > >> > > > request.” Just to be clear, the broker will always have > > cases > > > > > where > > > > > >> it > > > > > >> > > > might lose connection with a member. That’s not unique to > > the > > > > > >> initial > > > > > >> > > > heartbeat, right? > > > > > >> > > > > > > > > >> > > > KT2: There was a bit of back and forth about format of the > > > member > > > > > >> ID. > > > > > >> > > From > > > > > >> > > > what I gathered in the thread, the member ID is still > > defined > > > in > > > > > the > > > > > >> > RPC > > > > > >> > > as > > > > > >> > > > a string and not a UUID, right? The KIP states that the > > > “client > > > > > must > > > > > >> > > > generate a UUID as the member ID” and that the “server > will > > > > > validate > > > > > >> > > that a > > > > > >> > > > valid UUID is provided.” Is that a change for the server, > or > > > is it > > > > > >> > > already > > > > > >> > > > enforced as a UUID? > > > > > >> > > > > > > > > >> > > > KT3: Lianet mentioned some confusion over the use of the > > word > > > > > >> > “session.” > > > > > >> > > > Isn’t “session” synonymous with “membership?” > > > > > >> > > > > > > > > >> > > > KT4: Under “Member ID Lifecycle,” it states: "The client > > > should > > > > > >> reuse > > > > > >> > the > > > > > >> > > > same UUID as the member ID for all heartbeats and rejoin > > > attempts > > > > > to > > > > > >> > > > maintain continuity within the group.” Could we change the > > > first > > > > > >> part > > > > > >> > of > > > > > >> > > > that to “The Consumer instance should…” We do have > lifetimes > > > that > > > > > >> > extend > > > > > >> > > > past the lifetime of a client instance (such as the > > > transaction > > > > > ID). > > > > > >> > > > > > > > > >> > > > Thanks, > > > > > >> > > > Kirk > > > > > >> > > > > > > > > >> > > > > On Aug 29, 2024, at 1:28 AM, TengYao Chi < > > > kiting...@gmail.com> > > > > > >> > wrote: > > > > > >> > > > > > > > > > >> > > > > Hi David, > > > > > >> > > > > > > > > > >> > > > > Thank you for pointing that out. > > > > > >> > > > > I have updated the content of the KIP based on Lianet's > > and > > > your > > > > > >> > > > feedback. > > > > > >> > > > > Please take a look and let me know your thoughts. > > > > > >> > > > > > > > > > >> > > > > Best regards, > > > > > >> > > > > TengYao > > > > > >> > > > > > > > > > >> > > > > David Jacot <dja...@confluent.io.invalid> 於 2024年8月29日 > 週四 > > > > > >> 下午3:20寫道: > > > > > >> > > > > > > > > > >> > > > >> Hi TengYao, > > > > > >> > > > >> > > > > > >> > > > >> Thanks for the update. I haven't fully read it yet but > I > > > will > > > > > >> soon. > > > > > >> > > > >> > > > > > >> > > > >> LM4: This is incorrect. The consumer must keep its > member > > > id > > > > > >> during > > > > > >> > > its > > > > > >> > > > >> entire lifetime (until the process stops or dies). The > > > protocol > > > > > >> > > > stipulates > > > > > >> > > > >> that a member must rejoin with the same member id and > the > > > > > member > > > > > >> > epoch > > > > > >> > > > set > > > > > >> > > > >> to zero when an FENCED_MEMBER_EPOCH occurs. This allows > > the > > > > > >> member > > > > > >> > to > > > > > >> > > > >> resynchronize itself. We should not change this > > behavior. I > > > > > think > > > > > >> > that > > > > > >> > > > we > > > > > >> > > > >> should see the client side generation id as an > > incarnation > > > id > > > > > of > > > > > >> the > > > > > >> > > > >> application. It is generated once and kept until it > stops > > > or > > > > > >> dies. > > > > > >> > > > >> > > > > > >> > > > >> Best, > > > > > >> > > > >> David > > > > > >> > > > >> > > > > > >> > > > >> On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi < > > > > > kiting...@gmail.com > > > > > >> > > > > > > >> > > > wrote: > > > > > >> > > > >> > > > > > >> > > > >>> Hello Lianet ! > > > > > >> > > > >>> > > > > > >> > > > >>> Thanks for the reviews and suggestions! > > > > > >> > > > >>> > > > > > >> > > > >>> LM1: Indeed, we plan to enforce client-side ID > > generation > > > in > > > > > the > > > > > >> > > > future, > > > > > >> > > > >>> and it is not an alternative. I will change the title > > > > > >> accordingly. > > > > > >> > > > >>> > > > > > >> > > > >>> LM2: Yes, that's the expectation. I will add that > > > statement to > > > > > >> the > > > > > >> > > > public > > > > > >> > > > >>> interface section. > > > > > >> > > > >>> > > > > > >> > > > >>> LM3: Thank you for the high-level perspective review. > I > > > think > > > > > >> > you're > > > > > >> > > > >> right; > > > > > >> > > > >>> our intention isn't very clear since it was placed at > > the > > > end > > > > > of > > > > > >> > the > > > > > >> > > > >>> section. I will try to rephrase that section to make > it > > > more > > > > > >> > obvious. > > > > > >> > > > >>> > > > > > >> > > > >>> LM4: Regarding the definition of "session" in this > KIP, > > I > > > > > >> believe > > > > > >> > it > > > > > >> > > > >> refers > > > > > >> > > > >>> to the period between the *first-time heartbeat* and > > when > > > the > > > > > >> > > *consumer > > > > > >> > > > >>> leaves the group* (whether through a graceful shutdown > > or > > > a > > > > > >> > heartbeat > > > > > >> > > > >>> timeout). The consumer should reuse its UUID if it has > > > been > > > > > >> > generated > > > > > >> > > > >>> before. The only situation in which it will regenerate > > the > > > > > UUID > > > > > >> is > > > > > >> > if > > > > > >> > > > the > > > > > >> > > > >>> coordinator finds that there is already a consumer > with > > > the > > > > > same > > > > > >> > > UUID. > > > > > >> > > > >>> IIRC, the coordinator should compare the member > epochs, > > > and > > > > > the > > > > > >> > > > >>> later-joined consumer should be fenced off by the > > > coordinator > > > > > >> due > > > > > >> > to > > > > > >> > > > >> having > > > > > >> > > > >>> a lower member epoch. Once the consumer receives a > > > > > >> > > > `FENCED_MEMBER_EPOCH` > > > > > >> > > > >>> error, it will generate a new UUID and attempt to > > rejoin. > > > I > > > > > will > > > > > >> > > > clarify > > > > > >> > > > >>> this in the KIP. > > > > > >> > > > >>> > > > > > >> > > > >>> Thanks again for your reviews, I really appreciate it. > > > > > >> > > > >>> > > > > > >> > > > >>> Best regards, > > > > > >> > > > >>> TengYao > > > > > >> > > > >>> > > > > > >> > > > >>> Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 > > 下午7:12寫道: > > > > > >> > > > >>> > > > > > >> > > > >>>> Hello TengYao! Thanks for taking on this issue, we've > > > been > > > > > >> going > > > > > >> > > > around > > > > > >> > > > >>> it > > > > > >> > > > >>>> for a while. > > > > > >> > > > >>>> > > > > > >> > > > >>>> LM1: About the title of the KIP: "Enable ID > Generation > > > for > > > > > >> Clients > > > > > >> > > > over > > > > > >> > > > >>> the > > > > > >> > > > >>>> ConsumerGroupHeartbeat RPC". I find it confusing > > because > > > it > > > > > >> hints > > > > > >> > > that > > > > > >> > > > >>>> we're adding it as an alternative (which was > discussed > > > and > > > > > >> > > discarded, > > > > > >> > > > >> in > > > > > >> > > > >>>> favour of really enforcing it). It's also missing the > > > core > > > > > >> change > > > > > >> > > imo, > > > > > >> > > > >>>> which is "where" the generation happens. So, maybe > more > > > to > > > > > the > > > > > >> > point > > > > > >> > > > >> with > > > > > >> > > > >>>> something along the lines of "Client-side generated > ID > > > for > > > > > >> clients > > > > > >> > > > over > > > > > >> > > > >>>> ConsumerGroupHeartbeat RPC"? > > > > > >> > > > >>>> > > > > > >> > > > >>>> LM2: On the public interfaces section, the KIP states > > > that > > > > > "the > > > > > >> > > server > > > > > >> > > > >>> will > > > > > >> > > > >>>> reject the request", but we should agree on the > > specific > > > > > error > > > > > >> > > type. I > > > > > >> > > > >>>> expect it should fail with an > InvalidRequestException, > > is > > > > > that > > > > > >> the > > > > > >> > > > >>>> intention? (This was also suggested in the discussion > > > thread > > > > > >> > before > > > > > >> > > > but > > > > > >> > > > >>> is > > > > > >> > > > >>>> not in the KIP). > > > > > >> > > > >>>> > > > > > >> > > > >>>> LM3. Related to my previous point, I find that to be > > the > > > true > > > > > >> > > > >>> public-facing > > > > > >> > > > >>>> change (member ID mandatory at the protocol level), > but > > > it's > > > > > >> only > > > > > >> > at > > > > > >> > > > >> the > > > > > >> > > > >>>> end of the Public interfaces changes, kind of lost > > among > > > > > >> details > > > > > >> > of > > > > > >> > > > how > > > > > >> > > > >>>> we're going to do it. Should we rephrase that section > > > with > > > > > the > > > > > >> > > actual > > > > > >> > > > >>>> change first, and the hows after (ex. Bumping the > > > version is > > > > > >> not > > > > > >> > the > > > > > >> > > > >>>> public-facing change in this case, it's just the > > > mechanism to > > > > > >> > > properly > > > > > >> > > > >>>> introduce our change) > > > > > >> > > > >>>> > > > > > >> > > > >>>> LM4. Regarding the lifetime of the UUID: the KIP > states > > > we > > > > > will > > > > > >> > > > "Verify > > > > > >> > > > >>>> that the UUID remains consistent across all > subsequent > > > > > >> heartbeats > > > > > >> > > > >> during > > > > > >> > > > >>>> the session". What is this "session" referring to > > here? I > > > > > would > > > > > >> > > expect > > > > > >> > > > >>> that > > > > > >> > > > >>>> the UUID is associated to a consumer instance > > (generated > > > for > > > > > >> the > > > > > >> > > > >> consumer > > > > > >> > > > >>>> the first time it needs to send a HB if it doesn't > have > > > the > > > > > >> UUID > > > > > >> > > yet. > > > > > >> > > > >>> From > > > > > >> > > > >>>> there on, every time it needs to send a "first HB" > > > again, it > > > > > >> will > > > > > >> > > > reuse > > > > > >> > > > >>> its > > > > > >> > > > >>>> UUID, is that the intention? Note that we should > > consider > > > > > that > > > > > >> the > > > > > >> > > > same > > > > > >> > > > >>>> consumer instance may have many "first heartbeats", > > > meaning > > > > > >> > > heartbeats > > > > > >> > > > >> to > > > > > >> > > > >>>> join the group when it's not part of it (ex. consumer > > > > > >> unsubscribe > > > > > >> > + > > > > > >> > > > >>>> subscribe, fenced, stale). Is this the intention or > are > > > you > > > > > >> > > > considering > > > > > >> > > > >>> the > > > > > >> > > > >>>> lifetime differently? We should clarify it in the > KIP. > > > > > >> > > > >>>> > > > > > >> > > > >>>> Thanks! > > > > > >> > > > >>>> > > > > > >> > > > >>>> Lianet > > > > > >> > > > >>>> > > > > > >> > > > >>>> On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi < > > > > > >> kiting...@gmail.com> > > > > > >> > > > >> wrote: > > > > > >> > > > >>>> > > > > > >> > > > >>>>> Hi everyone, > > > > > >> > > > >>>>> > > > > > >> > > > >>>>> I have revised this KIP multiple times based on the > > > feedback > > > > > >> from > > > > > >> > > our > > > > > >> > > > >>>>> discussions. > > > > > >> > > > >>>>> I would greatly appreciate it if you could review it > > > when > > > > > you > > > > > >> > have > > > > > >> > > > >> the > > > > > >> > > > >>>>> time. > > > > > >> > > > >>>>> If there are no further comments or suggestions, I > > plan > > > to > > > > > >> > proceed > > > > > >> > > > >> with > > > > > >> > > > >>>>> initiating a vote soon. > > > > > >> > > > >>>>> > > > > > >> > > > >>>>> Best regards, > > > > > >> > > > >>>>> TengYao > > > > > >> > > > >>>>> > > > > > >> > > > >>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 > > > 下午2:43寫道: > > > > > >> > > > >>>>> > > > > > >> > > > >>>>>> Hi Andrew, > > > > > >> > > > >>>>>> Thank you for your previous feedback and insights. > > > > > >> > > > >>>>>> Your contributions have added valuable perspectives > > to > > > the > > > > > >> > > > >>> discussions. > > > > > >> > > > >>>>>> And we also benefit from the comparison of > different > > > > > >> solutions. > > > > > >> > > > >>>>>> I’m also looking forward to seeing an initial > version > > > in > > > > > >> > KIP-932, > > > > > >> > > > >> as > > > > > >> > > > >>> it > > > > > >> > > > >>>>>> will provide a good reference for future > > > implementations. > > > > > >> > > > >>>>>> > > > > > >> > > > >>>>>> Regarding your comment on AS2, I wanted to clarify > > > that my > > > > > >> > > > >>>> specification > > > > > >> > > > >>>>>> references org.apache.kafka.common.Uuid. > > > > > >> > > > >>>>>> I believe we’re referring to the same class, and it > > > might > > > > > >> just > > > > > >> > be > > > > > >> > > a > > > > > >> > > > >>>> small > > > > > >> > > > >>>>>> oversight due to the busy schedule. > > > > > >> > > > >>>>>> > > > > > >> > > > >>>>>> I want to express my gratitude once again for your > > many > > > > > >> > insightful > > > > > >> > > > >>>>>> comments, which have helped the discussion progress > > > > > smoothly. > > > > > >> > > > >>>>>> > > > > > >> > > > >>>>>> Best regards, > > > > > >> > > > >>>>>> TengYao > > > > > >> > > > >>>>>> > > > > > >> > > > >>>>>> > > > > > >> > > > >>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 > > > 2024年8月22日 > > > > > 週四 > > > > > >> > > > >>>> 下午11:28寫道: > > > > > >> > > > >>>>>> > > > > > >> > > > >>>>>>> Hi TengYao, > > > > > >> > > > >>>>>>> I’ve been reading through the comments and I’m > happy > > > that > > > > > >> the > > > > > >> > > > >> lobby > > > > > >> > > > >>>>>>> approach has not gained support. > > > > > >> > > > >>>>>>> > > > > > >> > > > >>>>>>> Assuming that this KIP is voted, I will be happy > to > > > change > > > > > >> > > KIP-932 > > > > > >> > > > >>> so > > > > > >> > > > >>>>>>> that it only supports client-side member ID > > > generation. > > > > > >> Because > > > > > >> > > > >> that > > > > > >> > > > >>>> KIP > > > > > >> > > > >>>>>>> is still > > > > > >> > > > >>>>>>> under development, I can do this in the first > > version > > > of > > > > > >> > > > >>>>>>> ShareGroupHeartbeat. > > > > > >> > > > >>>>>>> > > > > > >> > > > >>>>>>> AS2: For the encoding section, I suppose the > > specific > > > > > >> encoding > > > > > >> > > > >> which > > > > > >> > > > >>>>>>> is used is what org.apache.kafka.utils.Uuid uses. > > > > > >> > > > >>>>>>> > > > > > >> > > > >>>>>>> Thanks, > > > > > >> > > > >>>>>>> Andrew > > > > > >> > > > >>>>>>> > > > > > >> > > > >>>>>>>> On 14 Aug 2024, at 17:00, TengYao Chi < > > > > > kiting...@gmail.com > > > > > >> > > > > > > >> > > > >>> wrote: > > > > > >> > > > >>>>>>>> > > > > > >> > > > >>>>>>>> Hello Apoorv, > > > > > >> > > > >>>>>>>> Thank you for your feedback. > > > > > >> > > > >>>>>>>> Regarding the questions you raised, > unfortunately, > > > this > > > > > KIP > > > > > >> > > > >> cannot > > > > > >> > > > >>>>>>>> guarantee the order of heartbeats. As with many > > > classic > > > > > >> > > > >>> distributed > > > > > >> > > > >>>>>>> system > > > > > >> > > > >>>>>>>> challenges, what we can do is make our best > effort > > to > > > > > >> ensure > > > > > >> > > > >> that > > > > > >> > > > >>>>> there > > > > > >> > > > >>>>>>> are > > > > > >> > > > >>>>>>>> no idle members or stale assignments under normal > > > > > >> > circumstances. > > > > > >> > > > >>>>>>>> > > > > > >> > > > >>>>>>>> As for the lobby approach, I’m not a fan of it > > > because it > > > > > >> > > > >> requires > > > > > >> > > > >>>>>>> adding a > > > > > >> > > > >>>>>>>> mechanism to maintain client state within the > > > > > >> ConsumerGroup, > > > > > >> > > > >>> which, > > > > > >> > > > >>>> in > > > > > >> > > > >>>>>>> my > > > > > >> > > > >>>>>>>> view, resembles something like a two-phase > commit. > > > This > > > > > >> would > > > > > >> > > > >>>>> introduce > > > > > >> > > > >>>>>>>> more complexity than the proposal in this KIP, > > which > > > is > > > > > >> > > > >> something > > > > > >> > > > >>> we > > > > > >> > > > >>>>>>> want > > > > > >> > > > >>>>>>>> to avoid. KIP-848 aims to simplify the existing > > > protocol, > > > > > >> and > > > > > >> > > > >>> while > > > > > >> > > > >>>>> the > > > > > >> > > > >>>>>>>> lobby approach is a good one, I believe it is not > > the > > > > > right > > > > > >> > fit > > > > > >> > > > >>> for > > > > > >> > > > >>>>> this > > > > > >> > > > >>>>>>>> particular situation. > > > > > >> > > > >>>>>>>> > > > > > >> > > > >>>>>>>> Best regards, > > > > > >> > > > >>>>>>>> TengYao > > > > > >> > > > >>>>>>>> > > > > > >> > > > >>>>>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 > 週三 > > > > > >> 下午11:45寫道: > > > > > >> > > > >>>>>>>> > > > > > >> > > > >>>>>>>>> Hi David, > > > > > >> > > > >>>>>>>>> > > > > > >> > > > >>>>>>>>> I really appreciate your review and suggestions. > > As > > > I am > > > > > >> > still > > > > > >> > > > >>>>> gaining > > > > > >> > > > >>>>>>>>> experience in writing KIPs, your input has been > > > > > incredibly > > > > > >> > > > >>>> helpful. I > > > > > >> > > > >>>>>>> am > > > > > >> > > > >>>>>>>>> currently applying your suggestions to the KIP > and > > > will > > > > > >> > > > >> complete > > > > > >> > > > >>> it > > > > > >> > > > >>>>> as > > > > > >> > > > >>>>>>> soon > > > > > >> > > > >>>>>>>>> as possible. > > > > > >> > > > >>>>>>>>> Regarding the UUID part, I think we haven’t > > reached > > > a > > > > > >> > > > >> conclusion > > > > > >> > > > >>>>>>> yet.(So > > > > > >> > > > >>>>>>>>> far according to this thread) > > > > > >> > > > >>>>>>>>> However, I will review the current > implementation > > > in the > > > > > >> > Kafka > > > > > >> > > > >>>> `Uuid` > > > > > >> > > > >>>>>>>>> class and include a brief specification in the > > KIP. > > > > > >> > > > >>>>>>>>> > > > > > >> > > > >>>>>>>>> Once again, thank you so much for your help. > > > > > >> > > > >>>>>>>>> > > > > > >> > > > >>>>>>>>> Best regards, > > > > > >> > > > >>>>>>>>> TengYao > > > > > >> > > > >>>>>>>>> > > > > > >> > > > >>>>>>>>> Chia-Ping Tsai <chia7...@gmail.com> 於 > 2024年8月14日 > > 週三 > > > > > >> > 下午11:14寫道: > > > > > >> > > > >>>>>>>>> > > > > > >> > > > >>>>>>>>>> hi Apoorv > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> As the memberId is now known to the client, > and > > > client > > > > > >> > might > > > > > >> > > > >>> send > > > > > >> > > > >>>>> the > > > > > >> > > > >>>>>>>>>> leave > > > > > >> > > > >>>>>>>>>> group heartbeat on shutdown prior to receiving > > the > > > > > >> initial > > > > > >> > > > >>>> heartbeat > > > > > >> > > > >>>>>>>>>> response. If that's true then how do we > guarantee > > > that > > > > > >> the 2 > > > > > >> > > > >>>>> requests > > > > > >> > > > >>>>>>> to > > > > > >> > > > >>>>>>>>>> join and leave will be processed in order, > which > > > could > > > > > >> still > > > > > >> > > > >>> leave > > > > > >> > > > >>>>>>> stale > > > > > >> > > > >>>>>>>>>> members or throw unknown member id exceptions? > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>>> This is definitely a good question. the short > > > answer: > > > > > no > > > > > >> > > > >>> guarantee > > > > > >> > > > >>>>> but > > > > > >> > > > >>>>>>>>>> best > > > > > >> > > > >>>>>>>>>> efforts > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>>> Please notice the root cause is "we have no > > enough > > > time > > > > > >> to > > > > > >> > > > >> wait > > > > > >> > > > >>>>>>> member id > > > > > >> > > > >>>>>>>>>> (response) when closing consumer". Sadly, we > can' > > > > > >> guarantee > > > > > >> > > > >> the > > > > > >> > > > >>>>>>> request > > > > > >> > > > >>>>>>>>>> order due to the same reason. > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>>> However, in contrast to previous behavior, > there > > > is one > > > > > >> big > > > > > >> > > > >>>> benefit > > > > > >> > > > >>>>>>> of new > > > > > >> > > > >>>>>>>>>> approach - we can try STONITH because we know > the > > > > > member > > > > > >> id > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>>> Best, > > > > > >> > > > >>>>>>>>>> Chia-Ping > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 > > > 2024年8月14日 > > > > > 週三 > > > > > >> > > > >>>> 下午8:55寫道: > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> Hi TengYao, > > > > > >> > > > >>>>>>>>>>> Thanks for the KIP. Continuing on the point > > which > > > > > Andrew > > > > > >> > > > >>>> mentioned > > > > > >> > > > >>>>> as > > > > > >> > > > >>>>>>>>>> AS1. > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> As the memberId is now known to the client, > and > > > client > > > > > >> > might > > > > > >> > > > >>> send > > > > > >> > > > >>>>> the > > > > > >> > > > >>>>>>>>>> leave > > > > > >> > > > >>>>>>>>>>> group heartbeat on shutdown prior to receiving > > the > > > > > >> initial > > > > > >> > > > >>>>> heartbeat > > > > > >> > > > >>>>>>>>>>> response. If that's true then how do we > > guarantee > > > that > > > > > >> the > > > > > >> > 2 > > > > > >> > > > >>>>>>> requests to > > > > > >> > > > >>>>>>>>>>> join and leave will be processed in order, > which > > > could > > > > > >> > still > > > > > >> > > > >>>> leave > > > > > >> > > > >>>>>>> stale > > > > > >> > > > >>>>>>>>>>> members or throw unknown member id exceptions? > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> Though the client side member id generation is > > > helpful > > > > > >> > which > > > > > >> > > > >>> will > > > > > >> > > > >>>>>>>>>> represent > > > > > >> > > > >>>>>>>>>>> the same group perspective as from client and > > > broker's > > > > > >> end. > > > > > >> > > > >>> But I > > > > > >> > > > >>>>>>> think > > > > > >> > > > >>>>>>>>>> the > > > > > >> > > > >>>>>>>>>>> major concern we want to solve here is Stale > > > Partition > > > > > >> > > > >>>> Assignments > > > > > >> > > > >>>>>>> which > > > > > >> > > > >>>>>>>>>>> might still exist with the new approach. I am > > > leaning > > > > > >> > towards > > > > > >> > > > >>> the > > > > > >> > > > >>>>>>>>>>> suggestion mentioned by Andrew where partition > > > > > >> assignment > > > > > >> > > > >>>> triggers > > > > > >> > > > >>>>> on > > > > > >> > > > >>>>>>>>>>> subsequent heartbeat when client acknowledges > > the > > > > > >> initial > > > > > >> > > > >>>>> heartbeat, > > > > > >> > > > >>>>>>>>>>> delayed partition assignment. > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> Though on a separate note, I have a different > > > > > question. > > > > > >> > What > > > > > >> > > > >>>>> happens > > > > > >> > > > >>>>>>>>>> when > > > > > >> > > > >>>>>>>>>>> there is an issue with the client which sends > > the > > > > > >> initial > > > > > >> > > > >>>> heartbeat > > > > > >> > > > >>>>>>>>>> without > > > > > >> > > > >>>>>>>>>>> memberId, then crashes and restarts? I think > we > > > must > > > > > be > > > > > >> > > > >>>>> re-triggering > > > > > >> > > > >>>>>>>>>>> assignments and expiring members only after > the > > > > > >> heartbeat > > > > > >> > > > >>> session > > > > > >> > > > >>>>>>>>>> timeout? > > > > > >> > > > >>>>>>>>>>> If that's true then shall delayed partition > > > assignment > > > > > >> can > > > > > >> > > > >> help > > > > > >> > > > >>>>>>> benefit > > > > > >> > > > >>>>>>>>>> us > > > > > >> > > > >>>>>>>>>>> from this situation as well? > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> Regards, > > > > > >> > > > >>>>>>>>>>> Apoorv Mittal > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot > > > > > >> > > > >>>>>>>>>> <dja...@confluent.io.invalid> > > > > > >> > > > >>>>>>>>>>> wrote: > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>> Hi Andrew, > > > > > >> > > > >>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>> Personally, I don't like the lobby approach. > It > > > makes > > > > > >> > things > > > > > >> > > > >>>> more > > > > > >> > > > >>>>>>>>>>>> complicated and it would require changing the > > > records > > > > > >> on > > > > > >> > the > > > > > >> > > > >>>>> server > > > > > >> > > > >>>>>>>>>> too. > > > > > >> > > > >>>>>>>>>>>> This is why I initially suggested the > rejected > > > > > >> alternative > > > > > >> > > > >> #2 > > > > > >> > > > >>>>> which > > > > > >> > > > >>>>>>> is > > > > > >> > > > >>>>>>>>>>>> pretty close but also not perfect. > > > > > >> > > > >>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>> I'd like to clarify one thing. The > > > > > >> ConsumerGroupHeartbeat > > > > > >> > > > >> API > > > > > >> > > > >>>>>>> already > > > > > >> > > > >>>>>>>>>>>> supports generating the member id on the > client > > > so we > > > > > >> > don't > > > > > >> > > > >>> need > > > > > >> > > > >>>>> any > > > > > >> > > > >>>>>>>>>>>> conditional logic on the client side. This is > > > > > actually > > > > > >> > what > > > > > >> > > > >> we > > > > > >> > > > >>>>>>> wanted > > > > > >> > > > >>>>>>>>>> to > > > > > >> > > > >>>>>>>>>>> do > > > > > >> > > > >>>>>>>>>>>> in the first place but the idea got pushed > back > > > by > > > > > >> Magnus > > > > > >> > > > >> back > > > > > >> > > > >>>>> then > > > > > >> > > > >>>>>>>>>>> because > > > > > >> > > > >>>>>>>>>>>> generating uuid from librdkafka required a > new > > > > > >> dependency. > > > > > >> > > > >> It > > > > > >> > > > >>>>> turns > > > > > >> > > > >>>>>>>>>> out > > > > > >> > > > >>>>>>>>>>>> that librdkafka has that dependency today. In > > > > > >> retrospect, > > > > > >> > we > > > > > >> > > > >>>>> should > > > > > >> > > > >>>>>>>>>> have > > > > > >> > > > >>>>>>>>>>>> pushed back on this. Long story short, we can > > > just do > > > > > >> it. > > > > > >> > > > >> The > > > > > >> > > > >>>>>>>>>> proposal in > > > > > >> > > > >>>>>>>>>>>> this KIP is to make the member id required in > > > future > > > > > >> > > > >> versions. > > > > > >> > > > >>>> We > > > > > >> > > > >>>>>>>>>> could > > > > > >> > > > >>>>>>>>>>>> also decide not to do it and to keep > supporting > > > both > > > > > >> > > > >>>> approaches. I > > > > > >> > > > >>>>>>>>>> would > > > > > >> > > > >>>>>>>>>>>> also be fine with this. > > > > > >> > > > >>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>> Best, > > > > > >> > > > >>>>>>>>>>>> David > > > > > >> > > > >>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew > > > Schofield < > > > > > >> > > > >>>>>>>>>>>> andrew_schofi...@live.com> > > > > > >> > > > >>>>>>>>>>>> wrote: > > > > > >> > > > >>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>> Hi TengYao, > > > > > >> > > > >>>>>>>>>>>>> Thanks for your response. I’ll have just one > > > more > > > > > try > > > > > >> to > > > > > >> > > > >>>>> persuade. > > > > > >> > > > >>>>>>>>>>>>> I feel that I will need to follow the > approach > > > with > > > > > >> > KIP-932 > > > > > >> > > > >>>> when > > > > > >> > > > >>>>>>>>>> we’ve > > > > > >> > > > >>>>>>>>>>>>> made a decision, so I do have more than a > > > passing > > > > > >> > interest > > > > > >> > > > >> in > > > > > >> > > > >>>>> this. > > > > > >> > > > >>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>> A group member in the lobby is in the group, > > > but it > > > > > >> does > > > > > >> > > > >> not > > > > > >> > > > >>>> have > > > > > >> > > > >>>>>>>>>> any > > > > > >> > > > >>>>>>>>>>>>> assignments. A member of a consumer group > can > > > have > > > > > no > > > > > >> > > > >>> assigned > > > > > >> > > > >>>>>>>>>>>>> partitions (such as 5 CG members subscribed > > to a > > > > > topic > > > > > >> > > > >> with 4 > > > > > >> > > > >>>>>>>>>>>> partitions), > > > > > >> > > > >>>>>>>>>>>>> so it’s a situation that consumer group > > members > > > > > >> already > > > > > >> > > > >>> expect. > > > > > >> > > > >>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>> One of Kafka’s strengths is the way that we > > > handle > > > > > API > > > > > >> > > > >>>>> versioning. > > > > > >> > > > >>>>>>>>>>>>> But, there is a cost - the behaviour is > > > different > > > > > >> > depending > > > > > >> > > > >>> on > > > > > >> > > > >>>>> the > > > > > >> > > > >>>>>>>>>> RPC > > > > > >> > > > >>>>>>>>>>>>> version. KIP-848 is on the cusp of > completion, > > > but > > > > > >> we’re > > > > > >> > > > >>>> already > > > > > >> > > > >>>>>>>>>> adding > > > > > >> > > > >>>>>>>>>>>>> conditional logic for v0/v1 for > > > > > >> ConsumerGroupHeartbeat. > > > > > >> > > > >>> That’s > > > > > >> > > > >>>> a > > > > > >> > > > >>>>>>>>>> pity. > > > > > >> > > > >>>>>>>>>>>>> Only a minor issue, but it’s unfortunate. > > > > > >> > > > >>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>> Thanks, > > > > > >> > > > >>>>>>>>>>>>> Andrew > > > > > >> > > > >>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi < > > > > > >> > > > >> kiting...@gmail.com> > > > > > >> > > > >>>>>>>>>> wrote: > > > > > >> > > > >>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>> Hello Andrew > > > > > >> > > > >>>>>>>>>>>>>> Thank you for your thoughtful suggestions > and > > > > > getting > > > > > >> > the > > > > > >> > > > >>>>>>>>>> discussion > > > > > >> > > > >>>>>>>>>>>>> going. > > > > > >> > > > >>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>> To AS1: > > > > > >> > > > >>>>>>>>>>>>>> In the current scenario where the server > > > generates > > > > > >> the > > > > > >> > > > >> UUID, > > > > > >> > > > >>>> if > > > > > >> > > > >>>>>>>>>> the > > > > > >> > > > >>>>>>>>>>>>> client > > > > > >> > > > >>>>>>>>>>>>>> shuts down before receiving the memberId > > > generated > > > > > by > > > > > >> > the > > > > > >> > > > >> GC > > > > > >> > > > >>>>>>>>>>>> (regardless > > > > > >> > > > >>>>>>>>>>>>> of > > > > > >> > > > >>>>>>>>>>>>>> whether it’s a graceful shutdown or not), > the > > > GC > > > > > will > > > > > >> > > > >> still > > > > > >> > > > >>>> have > > > > > >> > > > >>>>>>>>>> to > > > > > >> > > > >>>>>>>>>>>> wait > > > > > >> > > > >>>>>>>>>>>>>> for the heartbeat timeout because the > client > > > > > doesn’t > > > > > >> > know > > > > > >> > > > >>> its > > > > > >> > > > >>>>>>>>>>> memberId. > > > > > >> > > > >>>>>>>>>>>>>> This KIP indeed cannot completely resolve > the > > > > > >> > idempotency > > > > > >> > > > >>>> issue, > > > > > >> > > > >>>>>>>>>> but > > > > > >> > > > >>>>>>>>>>> it > > > > > >> > > > >>>>>>>>>>>>> can > > > > > >> > > > >>>>>>>>>>>>>> better handle shutdown scenarios under > normal > > > > > >> > > > >> circumstances > > > > > >> > > > >>>>>>>>>> because > > > > > >> > > > >>>>>>>>>>> the > > > > > >> > > > >>>>>>>>>>>>>> client always knows its memberId. Even if > the > > > > > client > > > > > >> > shuts > > > > > >> > > > >>>> down > > > > > >> > > > >>>>>>>>>>>>> immediately > > > > > >> > > > >>>>>>>>>>>>>> after the initial heartbeat, as long as it > > > > > performs a > > > > > >> > > > >>> graceful > > > > > >> > > > >>>>>>>>>>> shutdown > > > > > >> > > > >>>>>>>>>>>>> and > > > > > >> > > > >>>>>>>>>>>>>> sends a leave heartbeat, the GC can manage > > the > > > > > >> situation > > > > > >> > > > >> and > > > > > >> > > > >>>>>>>>>> remove > > > > > >> > > > >>>>>>>>>>> the > > > > > >> > > > >>>>>>>>>>>>>> member. Therefore, the goal of this KIP is > to > > > > > address > > > > > >> > the > > > > > >> > > > >>>> issue > > > > > >> > > > >>>>>>>>>> where > > > > > >> > > > >>>>>>>>>>>> the > > > > > >> > > > >>>>>>>>>>>>>> GC has to wait for the heartbeat timeout > due > > > to the > > > > > >> > client > > > > > >> > > > >>>>> leaving > > > > > >> > > > >>>>>>>>>>>>> without > > > > > >> > > > >>>>>>>>>>>>>> knowing its memberId, which leads to > reduced > > > > > >> throughput > > > > > >> > > > >> and > > > > > >> > > > >>>>>>>>>> limited > > > > > >> > > > >>>>>>>>>>>>>> scalability. > > > > > >> > > > >>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>> The solution you suggest has also been > > > proposed by > > > > > >> > David. > > > > > >> > > > >>> The > > > > > >> > > > >>>>>>>>>> concern > > > > > >> > > > >>>>>>>>>>>>> with > > > > > >> > > > >>>>>>>>>>>>>> this approach is that it introduces > > additional > > > > > >> > complexity > > > > > >> > > > >>> for > > > > > >> > > > >>>>>>>>>>>>>> compatibility, as the new server would not > > > > > >> immediately > > > > > >> > add > > > > > >> > > > >>> the > > > > > >> > > > >>>>>>>>>> member > > > > > >> > > > >>>>>>>>>>>> to > > > > > >> > > > >>>>>>>>>>>>>> the group, while the old server would. This > > > > > requires > > > > > >> > > > >> clients > > > > > >> > > > >>>> to > > > > > >> > > > >>>>>>>>>>>>>> differentiate whether their memberId has > been > > > added > > > > > >> to > > > > > >> > the > > > > > >> > > > >>>> group > > > > > >> > > > >>>>>>>>>> or > > > > > >> > > > >>>>>>>>>>>> not, > > > > > >> > > > >>>>>>>>>>>>>> which could result in unexpected logs. > > > > > >> > > > >>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>> Best Regards, > > > > > >> > > > >>>>>>>>>>>>>> TengYao > > > > > >> > > > >>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>> Andrew Schofield < > andrew_schofi...@live.com> > > 於 > > > > > >> > 2024年8月14日 > > > > > >> > > > >>> 週三 > > > > > >> > > > >>>>>>>>>>>> 上午12:29寫道: > > > > > >> > > > >>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>> Hi TengYao, > > > > > >> > > > >>>>>>>>>>>>>>> Thanks for the KIP. I wonder if there’s a > > > > > different > > > > > >> way > > > > > >> > > > >> to > > > > > >> > > > >>>>> close > > > > > >> > > > >>>>>>>>>>> what > > > > > >> > > > >>>>>>>>>>>>>>> is quite a small window. > > > > > >> > > > >>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>> AS1: It is true that the initial heartbeat > > is > > > not > > > > > >> > > > >>> idempotent, > > > > > >> > > > >>>>> but > > > > > >> > > > >>>>>>>>>>> this > > > > > >> > > > >>>>>>>>>>>>>>> remains > > > > > >> > > > >>>>>>>>>>>>>>> true with this KIP. It’s just differently > > not > > > > > >> > idempotent. > > > > > >> > > > >>> If > > > > > >> > > > >>>>> the > > > > > >> > > > >>>>>>>>>>>> client > > > > > >> > > > >>>>>>>>>>>>>>> makes its > > > > > >> > > > >>>>>>>>>>>>>>> own member ID, sends a request and dies, > the > > > GC > > > > > will > > > > > >> > > > >> still > > > > > >> > > > >>>> have > > > > > >> > > > >>>>>>>>>>> added > > > > > >> > > > >>>>>>>>>>>>>>> the member to the group and it will hang > > > around > > > > > >> until > > > > > >> > the > > > > > >> > > > >>>>> session > > > > > >> > > > >>>>>>>>>>>>> expires. > > > > > >> > > > >>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>> I wonder if the GC could still generate > the > > > member > > > > > >> ID > > > > > >> > in > > > > > >> > > > >>>>>>>>>> response to > > > > > >> > > > >>>>>>>>>>>> the > > > > > >> > > > >>>>>>>>>>>>>>> first > > > > > >> > > > >>>>>>>>>>>>>>> heartbeat, and put the member in a special > > > PENDING > > > > > >> > state > > > > > >> > > > >>> with > > > > > >> > > > >>>>> no > > > > > >> > > > >>>>>>>>>>>>>>> assignments until the client sends the > next > > > > > >> heartbeat, > > > > > >> > > > >> thus > > > > > >> > > > >>>>>>>>>>> confirming > > > > > >> > > > >>>>>>>>>>>>> it > > > > > >> > > > >>>>>>>>>>>>>>> has received the member ID. This would not > > be > > > a > > > > > >> > protocol > > > > > >> > > > >>>> change > > > > > >> > > > >>>>>>>>>> at > > > > > >> > > > >>>>>>>>>>>> all, > > > > > >> > > > >>>>>>>>>>>>>>> just > > > > > >> > > > >>>>>>>>>>>>>>> a change to the GC to keep a new member in > > the > > > > > lobby > > > > > >> > > > >> until > > > > > >> > > > >>>> it’s > > > > > >> > > > >>>>>>>>>>>>> comfirmed > > > > > >> > > > >>>>>>>>>>>>>>> it knows its member ID. > > > > > >> > > > >>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>> Thanks, > > > > > >> > > > >>>>>>>>>>>>>>> Andrew > > > > > >> > > > >>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi < > > > > > >> > > > >>> kiting...@gmail.com> > > > > > >> > > > >>>>>>>>>> wrote: > > > > > >> > > > >>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>> Hi Chia-Ping, > > > > > >> > > > >>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>> Thanks for review and suggestions. > > > > > >> > > > >>>>>>>>>>>>>>>> I have updated the content of KIP > > > accordingly. > > > > > >> > > > >>>>>>>>>>>>>>>> Please take a look. > > > > > >> > > > >>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>> Best regards, > > > > > >> > > > >>>>>>>>>>>>>>>> TengYao > > > > > >> > > > >>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 > > > > > 2024年8月13日 > > > > > >> 週二 > > > > > >> > > > >>>>> 下午9:45寫道: > > > > > >> > > > >>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>> hi TengYao > > > > > >> > > > >>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>> thanks for this KIP. > > > > > >> > > > >>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>> 1) could you please describe the > > > before/after > > > > > >> > behavior > > > > > >> > > > >> in > > > > > >> > > > >>>> the > > > > > >> > > > >>>>>>>>>>>>> "Proposed > > > > > >> > > > >>>>>>>>>>>>>>>>> Changes" section? IIRC, current RPC > allows > > > HB > > > > > >> having > > > > > >> > > > >>> member > > > > > >> > > > >>>>> id > > > > > >> > > > >>>>>>>>>>>>>>> generated by > > > > > >> > > > >>>>>>>>>>>>>>>>> client, right? If HB has no member ID, > > > server > > > > > will > > > > > >> > > > >>> generate > > > > > >> > > > >>>>> one > > > > > >> > > > >>>>>>>>>>> and > > > > > >> > > > >>>>>>>>>>>>> then > > > > > >> > > > >>>>>>>>>>>>>>>>> return. The new behavior will enforce HB > > > "must" > > > > > >> have > > > > > >> > > > >>> member > > > > > >> > > > >>>>> ID. > > > > > >> > > > >>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>> 2) could you please write the version > > number > > > > > >> > explicitly > > > > > >> > > > >>> in > > > > > >> > > > >>>>> the > > > > > >> > > > >>>>>>>>>> KIP > > > > > >> > > > >>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>> 3) how new client code handle the old > HB? > > > Does > > > > > it > > > > > >> > > > >> always > > > > > >> > > > >>>>>>>>>> generate > > > > > >> > > > >>>>>>>>>>>>> member > > > > > >> > > > >>>>>>>>>>>>>>>>> ID on client-side even though that is > not > > > > > >> restricted? > > > > > >> > > > >>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>> Best, > > > > > >> > > > >>>>>>>>>>>>>>>>> Chia-Ping > > > > > >> > > > >>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi > wrote: > > > > > >> > > > >>>>>>>>>>>>>>>>>> Hello everyone, > > > > > >> > > > >>>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>>> I would like to start a discussion > thread > > > on > > > > > >> > KIP-1082, > > > > > >> > > > >>>> which > > > > > >> > > > >>>>>>>>>>>> proposes > > > > > >> > > > >>>>>>>>>>>>>>>>>> enabling id generation for clients over > > the > > > > > >> > > > >>>>>>>>>>> ConsumerGroupHeartbeat > > > > > >> > > > >>>>>>>>>>>>> RPC. > > > > > >> > > > >>>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>>> Here is the KIP Link: KIP-1082 > > > > > >> > > > >>>>>>>>>>>>>>>>>> < > > > > > >> > > > >>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>> > > > > > >> > > > >>>>> > > > > > >> > > > >>>> > > > > > >> > > > >>> > > > > > >> > > > >> > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC > > > > > >> > > > >>>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>>> Please take a look and let me know what > > you > > > > > >> think, > > > > > >> > > > >> and I > > > > > >> > > > >>>>> would > > > > > >> > > > >>>>>>>>>>>>>>> appreciate > > > > > >> > > > >>>>>>>>>>>>>>>>>> any suggestions and feedback. > > > > > >> > > > >>>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>>> Best regards, > > > > > >> > > > >>>>>>>>>>>>>>>>>> TengYao > > > > > >> > > > >>>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>>> > > > > > >> > > > >>>>>>>>>>> > > > > > >> > > > >>>>>>>>>> > > > > > >> > > > >>>>>>>>> > > > > > >> > > > >>>>>>> > > > > > >> > > > >>>>>>> > > > > > >> > > > >>>>> > > > > > >> > > > >>>> > > > > > >> > > > >>> > > > > > >> > > > >> > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >