Hi TengYao,

KT2: I don't think that we can realistically validate the uuid on the
server. It is basically a string of chars. So I lean towards having a good
recommendation in the KIP and in the document of the field in the RPC's
definition.

Best,
David

On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi <kiting...@gmail.com> wrote:

> Hello Kirk !
>
> Thank you for your comments !
>
> KT1: Yes, you are correct. The issue is not unique to the initial
> heartbeat; there can always be cases where the broker might lose connection
> with a member.
>
> KT2: Currently, if the client doesn't have a member ID and the memberEpoch
> equals 0, the coordinator will generate a UUID as the member ID for the
> client. However, at the RPC level, the member ID is sent as a literal
> string, meaning there are no restrictions on the format at this level.
> This also reminds me that we haven't reached a final conclusion on how to
> enforce the use of UUIDs.
> From our previous discussions, I recall two possible approaches:
> The first is to validate the UUID on the server side, and if it's not
> valid, throw an exception to the client.
> The second is to document a recommendation for clients to use UUIDs as
> member IDs, without strictly enforcing it.
> I think it's time to decide on the approach we want to take.
>
> KT3: Yes, "session" can be considered synonymous with "membership" in this
> context.
>
> KT4: Thank you for pointing that out. I will update the wording to
> specifically say this behavior is for consumers.
>
> Thanks again for your comments.
>
> Best regards,
> TengYao
>
> Kirk True <k...@kirktrue.pro> 於 2024年8月30日 週五 上午12:39寫道:
>
> > Hi TengYao!
> >
> > Sorry for being late to the discussion...
> >
> > After reading the thread and then the KIP, I had a few
> questions/comments:
> >
> > KT1: In Motivation, it states: "This scenario can result in the broker
> > registering a new member for which it will never receive a proper leave
> > request.” Just to be clear, the broker will always have cases where it
> > might lose connection with a member. That’s not unique to the initial
> > heartbeat, right?
> >
> > KT2: There was a bit of back and forth about format of the member ID.
> From
> > what I gathered in the thread, the member ID is still defined in the RPC
> as
> > a string and not a UUID, right? The KIP states that the “client must
> > generate a UUID as the member ID” and that the “server will validate
> that a
> > valid UUID is provided.” Is that a change for the server, or is it
> already
> > enforced as a UUID?
> >
> > KT3: Lianet mentioned some confusion over the use of the word “session.”
> > Isn’t “session” synonymous with “membership?”
> >
> > KT4: Under “Member ID Lifecycle,” it states: "The client should reuse the
> > same UUID as the member ID for all heartbeats and rejoin attempts to
> > maintain continuity within the group.” Could we change the first part of
> > that to “The Consumer instance should…” We do have lifetimes that extend
> > past the lifetime of a client instance (such as the transaction ID).
> >
> > Thanks,
> > Kirk
> >
> > > On Aug 29, 2024, at 1:28 AM, TengYao Chi <kiting...@gmail.com> wrote:
> > >
> > > Hi David,
> > >
> > > Thank you for pointing that out.
> > > I have updated the content of the KIP based on Lianet's and your
> > feedback.
> > > Please take a look and let me know your thoughts.
> > >
> > > Best regards,
> > > TengYao
> > >
> > > David Jacot <dja...@confluent.io.invalid> 於 2024年8月29日 週四 下午3:20寫道:
> > >
> > >> Hi TengYao,
> > >>
> > >> Thanks for the update. I haven't fully read it yet but I will soon.
> > >>
> > >> LM4: This is incorrect. The consumer must keep its member id during
> its
> > >> entire lifetime (until the process stops or dies). The protocol
> > stipulates
> > >> that a member must rejoin with the same member id and the member epoch
> > set
> > >> to zero when an FENCED_MEMBER_EPOCH occurs. This allows the member to
> > >> resynchronize itself. We should not change this behavior. I think that
> > we
> > >> should see the client side generation id as an incarnation id of the
> > >> application. It is generated once and kept until it stops or dies.
> > >>
> > >> Best,
> > >> David
> > >>
> > >> On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi <kiting...@gmail.com>
> > wrote:
> > >>
> > >>> Hello Lianet !
> > >>>
> > >>> Thanks for the reviews and suggestions!
> > >>>
> > >>> LM1: Indeed, we plan to enforce client-side ID generation in the
> > future,
> > >>> and it is not an alternative. I will change the title accordingly.
> > >>>
> > >>> LM2: Yes, that's the expectation. I will add that statement to the
> > public
> > >>> interface section.
> > >>>
> > >>> LM3: Thank you for the high-level perspective review. I think you're
> > >> right;
> > >>> our intention isn't very clear since it was placed at the end of the
> > >>> section. I will try to rephrase that section to make it more obvious.
> > >>>
> > >>> LM4: Regarding the definition of "session" in this KIP, I believe it
> > >> refers
> > >>> to the period between the *first-time heartbeat* and when the
> *consumer
> > >>> leaves the group* (whether through a graceful shutdown or a heartbeat
> > >>> timeout). The consumer should reuse its UUID if it has been generated
> > >>> before. The only situation in which it will regenerate the UUID is if
> > the
> > >>> coordinator finds that there is already a consumer with the same
> UUID.
> > >>> IIRC, the coordinator should compare the member epochs, and the
> > >>> later-joined consumer should be fenced off by the coordinator due to
> > >> having
> > >>> a lower member epoch. Once the consumer receives a
> > `FENCED_MEMBER_EPOCH`
> > >>> error, it will generate a new UUID and attempt to rejoin. I will
> > clarify
> > >>> this in the KIP.
> > >>>
> > >>> Thanks again for your reviews, I really appreciate it.
> > >>>
> > >>> Best regards,
> > >>> TengYao
> > >>>
> > >>> Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 下午7:12寫道:
> > >>>
> > >>>> Hello TengYao! Thanks for taking on this issue, we've been going
> > around
> > >>> it
> > >>>> for a while.
> > >>>>
> > >>>> LM1: About the title of the KIP: "Enable ID Generation for Clients
> > over
> > >>> the
> > >>>> ConsumerGroupHeartbeat RPC". I find it confusing because it hints
> that
> > >>>> we're adding it as an alternative (which was discussed and
> discarded,
> > >> in
> > >>>> favour of really enforcing it). It's also missing the core change
> imo,
> > >>>> which is "where" the generation happens. So, maybe more to the point
> > >> with
> > >>>> something along the lines of "Client-side generated ID for clients
> > over
> > >>>> ConsumerGroupHeartbeat RPC"?
> > >>>>
> > >>>> LM2: On the public interfaces section, the KIP states that "the
> server
> > >>> will
> > >>>> reject the request", but we should agree on the specific error
> type. I
> > >>>> expect it should fail with an InvalidRequestException, is that the
> > >>>> intention? (This was also suggested in the discussion thread before
> > but
> > >>> is
> > >>>> not in the KIP).
> > >>>>
> > >>>> LM3. Related to my previous point, I find that to be the true
> > >>> public-facing
> > >>>> change (member ID mandatory at the protocol level), but it's only at
> > >> the
> > >>>> end of the Public interfaces changes, kind of lost among details of
> > how
> > >>>> we're going to do it. Should we rephrase that section with the
> actual
> > >>>> change first, and the hows after (ex. Bumping the version is not the
> > >>>> public-facing change in this case, it's just the mechanism to
> properly
> > >>>> introduce our change)
> > >>>>
> > >>>> LM4. Regarding the lifetime of the UUID: the KIP states we will
> > "Verify
> > >>>> that the UUID remains consistent across all subsequent heartbeats
> > >> during
> > >>>> the session". What is this "session" referring to here? I would
> expect
> > >>> that
> > >>>> the UUID is associated to a consumer instance (generated for the
> > >> consumer
> > >>>> the first time it needs to send a HB if it doesn't have the UUID
> yet.
> > >>> From
> > >>>> there on, every time it needs to send a "first HB" again, it will
> > reuse
> > >>> its
> > >>>> UUID, is that the intention? Note that we should consider that the
> > same
> > >>>> consumer instance may have many "first heartbeats", meaning
> heartbeats
> > >> to
> > >>>> join the group when it's not part of it (ex. consumer unsubscribe +
> > >>>> subscribe, fenced, stale). Is this the intention or are you
> > considering
> > >>> the
> > >>>> lifetime differently? We should clarify it in the KIP.
> > >>>>
> > >>>> Thanks!
> > >>>>
> > >>>> Lianet
> > >>>>
> > >>>> On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi <kiting...@gmail.com>
> > >> wrote:
> > >>>>
> > >>>>> Hi everyone,
> > >>>>>
> > >>>>> I have revised this KIP multiple times based on the feedback from
> our
> > >>>>> discussions.
> > >>>>> I would greatly appreciate it if you could review it when you have
> > >> the
> > >>>>> time.
> > >>>>> If there are no further comments or suggestions, I plan to proceed
> > >> with
> > >>>>> initiating a vote soon.
> > >>>>>
> > >>>>> Best regards,
> > >>>>> TengYao
> > >>>>>
> > >>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道:
> > >>>>>
> > >>>>>> Hi Andrew,
> > >>>>>> Thank you for your previous feedback and insights.
> > >>>>>> Your contributions have added valuable perspectives to the
> > >>> discussions.
> > >>>>>> And we also benefit from the comparison of different solutions.
> > >>>>>> I’m also looking forward to seeing an initial version in KIP-932,
> > >> as
> > >>> it
> > >>>>>> will provide a good reference for future implementations.
> > >>>>>>
> > >>>>>> Regarding your comment on AS2, I wanted to clarify that my
> > >>>> specification
> > >>>>>> references org.apache.kafka.common.Uuid.
> > >>>>>> I believe we’re referring to the same class, and it might just be
> a
> > >>>> small
> > >>>>>> oversight due to the busy schedule.
> > >>>>>>
> > >>>>>> I want to express my gratitude once again for your many insightful
> > >>>>>> comments, which have helped the discussion progress smoothly.
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>> TengYao
> > >>>>>>
> > >>>>>>
> > >>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 週四
> > >>>> 下午11:28寫道:
> > >>>>>>
> > >>>>>>> Hi TengYao,
> > >>>>>>> I’ve been reading through the comments and I’m happy that the
> > >> lobby
> > >>>>>>> approach has not gained support.
> > >>>>>>>
> > >>>>>>> Assuming that this KIP is voted, I will be happy to change
> KIP-932
> > >>> so
> > >>>>>>> that it only supports client-side member ID generation. Because
> > >> that
> > >>>> KIP
> > >>>>>>> is still
> > >>>>>>> under development, I can do this in the first version of
> > >>>>>>> ShareGroupHeartbeat.
> > >>>>>>>
> > >>>>>>> AS2: For the encoding section, I suppose the specific encoding
> > >> which
> > >>>>>>> is used is what org.apache.kafka.utils.Uuid uses.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Andrew
> > >>>>>>>
> > >>>>>>>> On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com>
> > >>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hello Apoorv,
> > >>>>>>>> Thank you for your feedback.
> > >>>>>>>> Regarding the questions you raised, unfortunately, this KIP
> > >> cannot
> > >>>>>>>> guarantee the order of heartbeats. As with many classic
> > >>> distributed
> > >>>>>>> system
> > >>>>>>>> challenges, what we can do is make our best effort to ensure
> > >> that
> > >>>>> there
> > >>>>>>> are
> > >>>>>>>> no idle members or stale assignments under normal circumstances.
> > >>>>>>>>
> > >>>>>>>> As for the lobby approach, I’m not a fan of it because it
> > >> requires
> > >>>>>>> adding a
> > >>>>>>>> mechanism to maintain client state within the ConsumerGroup,
> > >>> which,
> > >>>> in
> > >>>>>>> my
> > >>>>>>>> view, resembles something like a two-phase commit. This would
> > >>>>> introduce
> > >>>>>>>> more complexity than the proposal in this KIP, which is
> > >> something
> > >>> we
> > >>>>>>> want
> > >>>>>>>> to avoid. KIP-848 aims to simplify the existing protocol, and
> > >>> while
> > >>>>> the
> > >>>>>>>> lobby approach is a good one, I believe it is not the right fit
> > >>> for
> > >>>>> this
> > >>>>>>>> particular situation.
> > >>>>>>>>
> > >>>>>>>> Best regards,
> > >>>>>>>> TengYao
> > >>>>>>>>
> > >>>>>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道:
> > >>>>>>>>
> > >>>>>>>>> Hi David,
> > >>>>>>>>>
> > >>>>>>>>> I really appreciate your review and suggestions. As I am still
> > >>>>> gaining
> > >>>>>>>>> experience in writing KIPs, your input has been incredibly
> > >>>> helpful. I
> > >>>>>>> am
> > >>>>>>>>> currently applying your suggestions to the KIP and will
> > >> complete
> > >>> it
> > >>>>> as
> > >>>>>>> soon
> > >>>>>>>>> as possible.
> > >>>>>>>>> Regarding the UUID part, I think we haven’t reached a
> > >> conclusion
> > >>>>>>> yet.(So
> > >>>>>>>>> far according to this thread)
> > >>>>>>>>> However, I will review the current implementation in the Kafka
> > >>>> `Uuid`
> > >>>>>>>>> class and include a brief specification in the KIP.
> > >>>>>>>>>
> > >>>>>>>>> Once again, thank you so much for your help.
> > >>>>>>>>>
> > >>>>>>>>> Best regards,
> > >>>>>>>>> TengYao
> > >>>>>>>>>
> > >>>>>>>>> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 下午11:14寫道:
> > >>>>>>>>>
> > >>>>>>>>>> hi Apoorv
> > >>>>>>>>>>
> > >>>>>>>>>>> As the memberId is now known to the client, and client might
> > >>> send
> > >>>>> the
> > >>>>>>>>>> leave
> > >>>>>>>>>> group heartbeat on shutdown prior to receiving the initial
> > >>>> heartbeat
> > >>>>>>>>>> response. If that's true then how do we guarantee that the 2
> > >>>>> requests
> > >>>>>>> to
> > >>>>>>>>>> join and leave will be processed in order, which could still
> > >>> leave
> > >>>>>>> stale
> > >>>>>>>>>> members or throw unknown member id exceptions?
> > >>>>>>>>>>
> > >>>>>>>>>> This is definitely a good question. the short answer: no
> > >>> guarantee
> > >>>>> but
> > >>>>>>>>>> best
> > >>>>>>>>>> efforts
> > >>>>>>>>>>
> > >>>>>>>>>> Please notice the root cause is "we have no enough time to
> > >> wait
> > >>>>>>> member id
> > >>>>>>>>>> (response) when closing consumer". Sadly, we can' guarantee
> > >> the
> > >>>>>>> request
> > >>>>>>>>>> order due to the same reason.
> > >>>>>>>>>>
> > >>>>>>>>>> However, in contrast to previous behavior, there is one big
> > >>>> benefit
> > >>>>>>> of new
> > >>>>>>>>>> approach - we can try STONITH because we know the member id
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Chia-Ping
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三
> > >>>> 下午8:55寫道:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi TengYao,
> > >>>>>>>>>>> Thanks for the KIP. Continuing on the point which Andrew
> > >>>> mentioned
> > >>>>> as
> > >>>>>>>>>> AS1.
> > >>>>>>>>>>>
> > >>>>>>>>>>> As the memberId is now known to the client, and client might
> > >>> send
> > >>>>> the
> > >>>>>>>>>> leave
> > >>>>>>>>>>> group heartbeat on shutdown prior to receiving the initial
> > >>>>> heartbeat
> > >>>>>>>>>>> response. If that's true then how do we guarantee that the 2
> > >>>>>>> requests to
> > >>>>>>>>>>> join and leave will be processed in order, which could still
> > >>>> leave
> > >>>>>>> stale
> > >>>>>>>>>>> members or throw unknown member id exceptions?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Though the client side member id generation is helpful which
> > >>> will
> > >>>>>>>>>> represent
> > >>>>>>>>>>> the same group perspective as from client and broker's end.
> > >>> But I
> > >>>>>>> think
> > >>>>>>>>>> the
> > >>>>>>>>>>> major concern we want to solve here is Stale Partition
> > >>>> Assignments
> > >>>>>>> which
> > >>>>>>>>>>> might still exist with the new approach. I am leaning towards
> > >>> the
> > >>>>>>>>>>> suggestion mentioned by Andrew where partition assignment
> > >>>> triggers
> > >>>>> on
> > >>>>>>>>>>> subsequent heartbeat when client acknowledges the initial
> > >>>>> heartbeat,
> > >>>>>>>>>>> delayed partition assignment.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Though on a separate note, I have a different question. What
> > >>>>> happens
> > >>>>>>>>>> when
> > >>>>>>>>>>> there is an issue with the client which sends the initial
> > >>>> heartbeat
> > >>>>>>>>>> without
> > >>>>>>>>>>> memberId, then crashes and restarts? I think we must be
> > >>>>> re-triggering
> > >>>>>>>>>>> assignments and expiring members only after the heartbeat
> > >>> session
> > >>>>>>>>>> timeout?
> > >>>>>>>>>>> If that's true then shall delayed partition assignment can
> > >> help
> > >>>>>>> benefit
> > >>>>>>>>>> us
> > >>>>>>>>>>> from this situation as well?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Apoorv Mittal
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot
> > >>>>>>>>>> <dja...@confluent.io.invalid>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Andrew,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Personally, I don't like the lobby approach. It makes things
> > >>>> more
> > >>>>>>>>>>>> complicated and it would require changing the records on the
> > >>>>> server
> > >>>>>>>>>> too.
> > >>>>>>>>>>>> This is why I initially suggested the rejected alternative
> > >> #2
> > >>>>> which
> > >>>>>>> is
> > >>>>>>>>>>>> pretty close but also not perfect.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat
> > >> API
> > >>>>>>> already
> > >>>>>>>>>>>> supports generating the member id on the client so we don't
> > >>> need
> > >>>>> any
> > >>>>>>>>>>>> conditional logic on the client side. This is actually what
> > >> we
> > >>>>>>> wanted
> > >>>>>>>>>> to
> > >>>>>>>>>>> do
> > >>>>>>>>>>>> in the first place but the idea got pushed back by Magnus
> > >> back
> > >>>>> then
> > >>>>>>>>>>> because
> > >>>>>>>>>>>> generating uuid from librdkafka required a new dependency.
> > >> It
> > >>>>> turns
> > >>>>>>>>>> out
> > >>>>>>>>>>>> that librdkafka has that dependency today. In retrospect, we
> > >>>>> should
> > >>>>>>>>>> have
> > >>>>>>>>>>>> pushed back on this. Long story short, we can just do it.
> > >> The
> > >>>>>>>>>> proposal in
> > >>>>>>>>>>>> this KIP is to make the member id required in future
> > >> versions.
> > >>>> We
> > >>>>>>>>>> could
> > >>>>>>>>>>>> also decide not to do it and to keep supporting both
> > >>>> approaches. I
> > >>>>>>>>>> would
> > >>>>>>>>>>>> also be fine with this.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> David
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield <
> > >>>>>>>>>>>> andrew_schofi...@live.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi TengYao,
> > >>>>>>>>>>>>> Thanks for your response. I’ll have just one more try to
> > >>>>> persuade.
> > >>>>>>>>>>>>> I feel that I will need to follow the approach with KIP-932
> > >>>> when
> > >>>>>>>>>> we’ve
> > >>>>>>>>>>>>> made a decision, so I do have more than a passing interest
> > >> in
> > >>>>> this.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> A group member in the lobby is in the group, but it does
> > >> not
> > >>>> have
> > >>>>>>>>>> any
> > >>>>>>>>>>>>> assignments. A member of a consumer group can have no
> > >>> assigned
> > >>>>>>>>>>>>> partitions (such as 5 CG members subscribed to a topic
> > >> with 4
> > >>>>>>>>>>>> partitions),
> > >>>>>>>>>>>>> so it’s a situation that consumer group members already
> > >>> expect.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> One of Kafka’s strengths is the way that we handle API
> > >>>>> versioning.
> > >>>>>>>>>>>>> But, there is a cost - the behaviour is different depending
> > >>> on
> > >>>>> the
> > >>>>>>>>>> RPC
> > >>>>>>>>>>>>> version. KIP-848 is on the cusp of completion, but we’re
> > >>>> already
> > >>>>>>>>>> adding
> > >>>>>>>>>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat.
> > >>> That’s
> > >>>> a
> > >>>>>>>>>> pity.
> > >>>>>>>>>>>>> Only a minor issue, but it’s unfortunate.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Andrew
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi <
> > >> kiting...@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hello Andrew
> > >>>>>>>>>>>>>> Thank you for your thoughtful suggestions and getting the
> > >>>>>>>>>> discussion
> > >>>>>>>>>>>>> going.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> To AS1:
> > >>>>>>>>>>>>>> In the current scenario where the server generates the
> > >> UUID,
> > >>>> if
> > >>>>>>>>>> the
> > >>>>>>>>>>>>> client
> > >>>>>>>>>>>>>> shuts down before receiving the memberId generated by the
> > >> GC
> > >>>>>>>>>>>> (regardless
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>> whether it’s a graceful shutdown or not), the GC will
> > >> still
> > >>>> have
> > >>>>>>>>>> to
> > >>>>>>>>>>>> wait
> > >>>>>>>>>>>>>> for the heartbeat timeout because the client doesn’t know
> > >>> its
> > >>>>>>>>>>> memberId.
> > >>>>>>>>>>>>>> This KIP indeed cannot completely resolve the idempotency
> > >>>> issue,
> > >>>>>>>>>> but
> > >>>>>>>>>>> it
> > >>>>>>>>>>>>> can
> > >>>>>>>>>>>>>> better handle shutdown scenarios under normal
> > >> circumstances
> > >>>>>>>>>> because
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> client always knows its memberId. Even if the client shuts
> > >>>> down
> > >>>>>>>>>>>>> immediately
> > >>>>>>>>>>>>>> after the initial heartbeat, as long as it performs a
> > >>> graceful
> > >>>>>>>>>>> shutdown
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>> sends a leave heartbeat, the GC can manage the situation
> > >> and
> > >>>>>>>>>> remove
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> member. Therefore, the goal of this KIP is to address the
> > >>>> issue
> > >>>>>>>>>> where
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> GC has to wait for the heartbeat timeout due to the client
> > >>>>> leaving
> > >>>>>>>>>>>>> without
> > >>>>>>>>>>>>>> knowing its memberId, which leads to reduced throughput
> > >> and
> > >>>>>>>>>> limited
> > >>>>>>>>>>>>>> scalability.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The solution you suggest has also been proposed by David.
> > >>> The
> > >>>>>>>>>> concern
> > >>>>>>>>>>>>> with
> > >>>>>>>>>>>>>> this approach is that it introduces additional complexity
> > >>> for
> > >>>>>>>>>>>>>> compatibility, as the new server would not immediately add
> > >>> the
> > >>>>>>>>>> member
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>> the group, while the old server would. This requires
> > >> clients
> > >>>> to
> > >>>>>>>>>>>>>> differentiate whether their memberId has been added to the
> > >>>> group
> > >>>>>>>>>> or
> > >>>>>>>>>>>> not,
> > >>>>>>>>>>>>>> which could result in unexpected logs.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Best Regards,
> > >>>>>>>>>>>>>> TengYao
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月14日
> > >>> 週三
> > >>>>>>>>>>>> 上午12:29寫道:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hi TengYao,
> > >>>>>>>>>>>>>>> Thanks for the KIP. I wonder if there’s a different way
> > >> to
> > >>>>> close
> > >>>>>>>>>>> what
> > >>>>>>>>>>>>>>> is quite a small window.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> AS1: It is true that the initial heartbeat is not
> > >>> idempotent,
> > >>>>> but
> > >>>>>>>>>>> this
> > >>>>>>>>>>>>>>> remains
> > >>>>>>>>>>>>>>> true with this KIP. It’s just differently not idempotent.
> > >>> If
> > >>>>> the
> > >>>>>>>>>>>> client
> > >>>>>>>>>>>>>>> makes its
> > >>>>>>>>>>>>>>> own member ID, sends a request and dies, the GC will
> > >> still
> > >>>> have
> > >>>>>>>>>>> added
> > >>>>>>>>>>>>>>> the member to the group and it will hang around until the
> > >>>>> session
> > >>>>>>>>>>>>> expires.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I wonder if the GC could still generate the member ID in
> > >>>>>>>>>> response to
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> first
> > >>>>>>>>>>>>>>> heartbeat, and put the member in a special PENDING state
> > >>> with
> > >>>>> no
> > >>>>>>>>>>>>>>> assignments until the client sends the next heartbeat,
> > >> thus
> > >>>>>>>>>>> confirming
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>> has received the member ID. This would not be a protocol
> > >>>> change
> > >>>>>>>>>> at
> > >>>>>>>>>>>> all,
> > >>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>> a change to the GC to keep a new member in the lobby
> > >> until
> > >>>> it’s
> > >>>>>>>>>>>>> comfirmed
> > >>>>>>>>>>>>>>> it knows its member ID.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>> Andrew
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi <
> > >>> kiting...@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi Chia-Ping,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks for review and suggestions.
> > >>>>>>>>>>>>>>>> I have updated the content of KIP accordingly.
> > >>>>>>>>>>>>>>>> Please take a look.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Best regards,
> > >>>>>>>>>>>>>>>> TengYao
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二
> > >>>>> 下午9:45寫道:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> hi TengYao
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> thanks for this KIP.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 1) could you please describe the before/after behavior
> > >> in
> > >>>> the
> > >>>>>>>>>>>>> "Proposed
> > >>>>>>>>>>>>>>>>> Changes" section? IIRC, current RPC allows HB having
> > >>> member
> > >>>>> id
> > >>>>>>>>>>>>>>> generated by
> > >>>>>>>>>>>>>>>>> client, right? If HB has no member ID, server will
> > >>> generate
> > >>>>> one
> > >>>>>>>>>>> and
> > >>>>>>>>>>>>> then
> > >>>>>>>>>>>>>>>>> return. The new behavior will enforce HB "must" have
> > >>> member
> > >>>>> ID.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 2) could you please write the version number explicitly
> > >>> in
> > >>>>> the
> > >>>>>>>>>> KIP
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 3) how new client code handle the old HB? Does it
> > >> always
> > >>>>>>>>>> generate
> > >>>>>>>>>>>>> member
> > >>>>>>>>>>>>>>>>> ID on client-side even though that is not restricted?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>> Chia-Ping
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote:
> > >>>>>>>>>>>>>>>>>> Hello everyone,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I would like to start a discussion thread on KIP-1082,
> > >>>> which
> > >>>>>>>>>>>> proposes
> > >>>>>>>>>>>>>>>>>> enabling id generation for clients over the
> > >>>>>>>>>>> ConsumerGroupHeartbeat
> > >>>>>>>>>>>>> RPC.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Here is the KIP Link: KIP-1082
> > >>>>>>>>>>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Please take a look and let me know what you think,
> > >> and I
> > >>>>> would
> > >>>>>>>>>>>>>>> appreciate
> > >>>>>>>>>>>>>>>>>> any suggestions and feedback.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Best regards,
> > >>>>>>>>>>>>>>>>>> TengYao
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Reply via email to