hi TengYao

KT2: +1 to second approach

Best,
Chia-Ping


David Jacot <dja...@confluent.io.invalid> 於 2024年8月30日 週五 下午9:15寫道:

> Hi TengYao,
>
> KT2: I don't think that we can realistically validate the uuid on the
> server. It is basically a string of chars. So I lean towards having a good
> recommendation in the KIP and in the document of the field in the RPC's
> definition.
>
> Best,
> David
>
> On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi <kiting...@gmail.com> wrote:
>
> > Hello Kirk !
> >
> > Thank you for your comments !
> >
> > KT1: Yes, you are correct. The issue is not unique to the initial
> > heartbeat; there can always be cases where the broker might lose
> connection
> > with a member.
> >
> > KT2: Currently, if the client doesn't have a member ID and the
> memberEpoch
> > equals 0, the coordinator will generate a UUID as the member ID for the
> > client. However, at the RPC level, the member ID is sent as a literal
> > string, meaning there are no restrictions on the format at this level.
> > This also reminds me that we haven't reached a final conclusion on how to
> > enforce the use of UUIDs.
> > From our previous discussions, I recall two possible approaches:
> > The first is to validate the UUID on the server side, and if it's not
> > valid, throw an exception to the client.
> > The second is to document a recommendation for clients to use UUIDs as
> > member IDs, without strictly enforcing it.
> > I think it's time to decide on the approach we want to take.
> >
> > KT3: Yes, "session" can be considered synonymous with "membership" in
> this
> > context.
> >
> > KT4: Thank you for pointing that out. I will update the wording to
> > specifically say this behavior is for consumers.
> >
> > Thanks again for your comments.
> >
> > Best regards,
> > TengYao
> >
> > Kirk True <k...@kirktrue.pro> 於 2024年8月30日 週五 上午12:39寫道:
> >
> > > Hi TengYao!
> > >
> > > Sorry for being late to the discussion...
> > >
> > > After reading the thread and then the KIP, I had a few
> > questions/comments:
> > >
> > > KT1: In Motivation, it states: "This scenario can result in the broker
> > > registering a new member for which it will never receive a proper leave
> > > request.” Just to be clear, the broker will always have cases where it
> > > might lose connection with a member. That’s not unique to the initial
> > > heartbeat, right?
> > >
> > > KT2: There was a bit of back and forth about format of the member ID.
> > From
> > > what I gathered in the thread, the member ID is still defined in the
> RPC
> > as
> > > a string and not a UUID, right? The KIP states that the “client must
> > > generate a UUID as the member ID” and that the “server will validate
> > that a
> > > valid UUID is provided.” Is that a change for the server, or is it
> > already
> > > enforced as a UUID?
> > >
> > > KT3: Lianet mentioned some confusion over the use of the word
> “session.”
> > > Isn’t “session” synonymous with “membership?”
> > >
> > > KT4: Under “Member ID Lifecycle,” it states: "The client should reuse
> the
> > > same UUID as the member ID for all heartbeats and rejoin attempts to
> > > maintain continuity within the group.” Could we change the first part
> of
> > > that to “The Consumer instance should…” We do have lifetimes that
> extend
> > > past the lifetime of a client instance (such as the transaction ID).
> > >
> > > Thanks,
> > > Kirk
> > >
> > > > On Aug 29, 2024, at 1:28 AM, TengYao Chi <kiting...@gmail.com>
> wrote:
> > > >
> > > > Hi David,
> > > >
> > > > Thank you for pointing that out.
> > > > I have updated the content of the KIP based on Lianet's and your
> > > feedback.
> > > > Please take a look and let me know your thoughts.
> > > >
> > > > Best regards,
> > > > TengYao
> > > >
> > > > David Jacot <dja...@confluent.io.invalid> 於 2024年8月29日 週四 下午3:20寫道:
> > > >
> > > >> Hi TengYao,
> > > >>
> > > >> Thanks for the update. I haven't fully read it yet but I will soon.
> > > >>
> > > >> LM4: This is incorrect. The consumer must keep its member id during
> > its
> > > >> entire lifetime (until the process stops or dies). The protocol
> > > stipulates
> > > >> that a member must rejoin with the same member id and the member
> epoch
> > > set
> > > >> to zero when an FENCED_MEMBER_EPOCH occurs. This allows the member
> to
> > > >> resynchronize itself. We should not change this behavior. I think
> that
> > > we
> > > >> should see the client side generation id as an incarnation id of the
> > > >> application. It is generated once and kept until it stops or dies.
> > > >>
> > > >> Best,
> > > >> David
> > > >>
> > > >> On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi <kiting...@gmail.com>
> > > wrote:
> > > >>
> > > >>> Hello Lianet !
> > > >>>
> > > >>> Thanks for the reviews and suggestions!
> > > >>>
> > > >>> LM1: Indeed, we plan to enforce client-side ID generation in the
> > > future,
> > > >>> and it is not an alternative. I will change the title accordingly.
> > > >>>
> > > >>> LM2: Yes, that's the expectation. I will add that statement to the
> > > public
> > > >>> interface section.
> > > >>>
> > > >>> LM3: Thank you for the high-level perspective review. I think
> you're
> > > >> right;
> > > >>> our intention isn't very clear since it was placed at the end of
> the
> > > >>> section. I will try to rephrase that section to make it more
> obvious.
> > > >>>
> > > >>> LM4: Regarding the definition of "session" in this KIP, I believe
> it
> > > >> refers
> > > >>> to the period between the *first-time heartbeat* and when the
> > *consumer
> > > >>> leaves the group* (whether through a graceful shutdown or a
> heartbeat
> > > >>> timeout). The consumer should reuse its UUID if it has been
> generated
> > > >>> before. The only situation in which it will regenerate the UUID is
> if
> > > the
> > > >>> coordinator finds that there is already a consumer with the same
> > UUID.
> > > >>> IIRC, the coordinator should compare the member epochs, and the
> > > >>> later-joined consumer should be fenced off by the coordinator due
> to
> > > >> having
> > > >>> a lower member epoch. Once the consumer receives a
> > > `FENCED_MEMBER_EPOCH`
> > > >>> error, it will generate a new UUID and attempt to rejoin. I will
> > > clarify
> > > >>> this in the KIP.
> > > >>>
> > > >>> Thanks again for your reviews, I really appreciate it.
> > > >>>
> > > >>> Best regards,
> > > >>> TengYao
> > > >>>
> > > >>> Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 下午7:12寫道:
> > > >>>
> > > >>>> Hello TengYao! Thanks for taking on this issue, we've been going
> > > around
> > > >>> it
> > > >>>> for a while.
> > > >>>>
> > > >>>> LM1: About the title of the KIP: "Enable ID Generation for Clients
> > > over
> > > >>> the
> > > >>>> ConsumerGroupHeartbeat RPC". I find it confusing because it hints
> > that
> > > >>>> we're adding it as an alternative (which was discussed and
> > discarded,
> > > >> in
> > > >>>> favour of really enforcing it). It's also missing the core change
> > imo,
> > > >>>> which is "where" the generation happens. So, maybe more to the
> point
> > > >> with
> > > >>>> something along the lines of "Client-side generated ID for clients
> > > over
> > > >>>> ConsumerGroupHeartbeat RPC"?
> > > >>>>
> > > >>>> LM2: On the public interfaces section, the KIP states that "the
> > server
> > > >>> will
> > > >>>> reject the request", but we should agree on the specific error
> > type. I
> > > >>>> expect it should fail with an InvalidRequestException, is that the
> > > >>>> intention? (This was also suggested in the discussion thread
> before
> > > but
> > > >>> is
> > > >>>> not in the KIP).
> > > >>>>
> > > >>>> LM3. Related to my previous point, I find that to be the true
> > > >>> public-facing
> > > >>>> change (member ID mandatory at the protocol level), but it's only
> at
> > > >> the
> > > >>>> end of the Public interfaces changes, kind of lost among details
> of
> > > how
> > > >>>> we're going to do it. Should we rephrase that section with the
> > actual
> > > >>>> change first, and the hows after (ex. Bumping the version is not
> the
> > > >>>> public-facing change in this case, it's just the mechanism to
> > properly
> > > >>>> introduce our change)
> > > >>>>
> > > >>>> LM4. Regarding the lifetime of the UUID: the KIP states we will
> > > "Verify
> > > >>>> that the UUID remains consistent across all subsequent heartbeats
> > > >> during
> > > >>>> the session". What is this "session" referring to here? I would
> > expect
> > > >>> that
> > > >>>> the UUID is associated to a consumer instance (generated for the
> > > >> consumer
> > > >>>> the first time it needs to send a HB if it doesn't have the UUID
> > yet.
> > > >>> From
> > > >>>> there on, every time it needs to send a "first HB" again, it will
> > > reuse
> > > >>> its
> > > >>>> UUID, is that the intention? Note that we should consider that the
> > > same
> > > >>>> consumer instance may have many "first heartbeats", meaning
> > heartbeats
> > > >> to
> > > >>>> join the group when it's not part of it (ex. consumer unsubscribe
> +
> > > >>>> subscribe, fenced, stale). Is this the intention or are you
> > > considering
> > > >>> the
> > > >>>> lifetime differently? We should clarify it in the KIP.
> > > >>>>
> > > >>>> Thanks!
> > > >>>>
> > > >>>> Lianet
> > > >>>>
> > > >>>> On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi <kiting...@gmail.com>
> > > >> wrote:
> > > >>>>
> > > >>>>> Hi everyone,
> > > >>>>>
> > > >>>>> I have revised this KIP multiple times based on the feedback from
> > our
> > > >>>>> discussions.
> > > >>>>> I would greatly appreciate it if you could review it when you
> have
> > > >> the
> > > >>>>> time.
> > > >>>>> If there are no further comments or suggestions, I plan to
> proceed
> > > >> with
> > > >>>>> initiating a vote soon.
> > > >>>>>
> > > >>>>> Best regards,
> > > >>>>> TengYao
> > > >>>>>
> > > >>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道:
> > > >>>>>
> > > >>>>>> Hi Andrew,
> > > >>>>>> Thank you for your previous feedback and insights.
> > > >>>>>> Your contributions have added valuable perspectives to the
> > > >>> discussions.
> > > >>>>>> And we also benefit from the comparison of different solutions.
> > > >>>>>> I’m also looking forward to seeing an initial version in
> KIP-932,
> > > >> as
> > > >>> it
> > > >>>>>> will provide a good reference for future implementations.
> > > >>>>>>
> > > >>>>>> Regarding your comment on AS2, I wanted to clarify that my
> > > >>>> specification
> > > >>>>>> references org.apache.kafka.common.Uuid.
> > > >>>>>> I believe we’re referring to the same class, and it might just
> be
> > a
> > > >>>> small
> > > >>>>>> oversight due to the busy schedule.
> > > >>>>>>
> > > >>>>>> I want to express my gratitude once again for your many
> insightful
> > > >>>>>> comments, which have helped the discussion progress smoothly.
> > > >>>>>>
> > > >>>>>> Best regards,
> > > >>>>>> TengYao
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 週四
> > > >>>> 下午11:28寫道:
> > > >>>>>>
> > > >>>>>>> Hi TengYao,
> > > >>>>>>> I’ve been reading through the comments and I’m happy that the
> > > >> lobby
> > > >>>>>>> approach has not gained support.
> > > >>>>>>>
> > > >>>>>>> Assuming that this KIP is voted, I will be happy to change
> > KIP-932
> > > >>> so
> > > >>>>>>> that it only supports client-side member ID generation. Because
> > > >> that
> > > >>>> KIP
> > > >>>>>>> is still
> > > >>>>>>> under development, I can do this in the first version of
> > > >>>>>>> ShareGroupHeartbeat.
> > > >>>>>>>
> > > >>>>>>> AS2: For the encoding section, I suppose the specific encoding
> > > >> which
> > > >>>>>>> is used is what org.apache.kafka.utils.Uuid uses.
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Andrew
> > > >>>>>>>
> > > >>>>>>>> On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com>
> > > >>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> Hello Apoorv,
> > > >>>>>>>> Thank you for your feedback.
> > > >>>>>>>> Regarding the questions you raised, unfortunately, this KIP
> > > >> cannot
> > > >>>>>>>> guarantee the order of heartbeats. As with many classic
> > > >>> distributed
> > > >>>>>>> system
> > > >>>>>>>> challenges, what we can do is make our best effort to ensure
> > > >> that
> > > >>>>> there
> > > >>>>>>> are
> > > >>>>>>>> no idle members or stale assignments under normal
> circumstances.
> > > >>>>>>>>
> > > >>>>>>>> As for the lobby approach, I’m not a fan of it because it
> > > >> requires
> > > >>>>>>> adding a
> > > >>>>>>>> mechanism to maintain client state within the ConsumerGroup,
> > > >>> which,
> > > >>>> in
> > > >>>>>>> my
> > > >>>>>>>> view, resembles something like a two-phase commit. This would
> > > >>>>> introduce
> > > >>>>>>>> more complexity than the proposal in this KIP, which is
> > > >> something
> > > >>> we
> > > >>>>>>> want
> > > >>>>>>>> to avoid. KIP-848 aims to simplify the existing protocol, and
> > > >>> while
> > > >>>>> the
> > > >>>>>>>> lobby approach is a good one, I believe it is not the right
> fit
> > > >>> for
> > > >>>>> this
> > > >>>>>>>> particular situation.
> > > >>>>>>>>
> > > >>>>>>>> Best regards,
> > > >>>>>>>> TengYao
> > > >>>>>>>>
> > > >>>>>>>> TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道:
> > > >>>>>>>>
> > > >>>>>>>>> Hi David,
> > > >>>>>>>>>
> > > >>>>>>>>> I really appreciate your review and suggestions. As I am
> still
> > > >>>>> gaining
> > > >>>>>>>>> experience in writing KIPs, your input has been incredibly
> > > >>>> helpful. I
> > > >>>>>>> am
> > > >>>>>>>>> currently applying your suggestions to the KIP and will
> > > >> complete
> > > >>> it
> > > >>>>> as
> > > >>>>>>> soon
> > > >>>>>>>>> as possible.
> > > >>>>>>>>> Regarding the UUID part, I think we haven’t reached a
> > > >> conclusion
> > > >>>>>>> yet.(So
> > > >>>>>>>>> far according to this thread)
> > > >>>>>>>>> However, I will review the current implementation in the
> Kafka
> > > >>>> `Uuid`
> > > >>>>>>>>> class and include a brief specification in the KIP.
> > > >>>>>>>>>
> > > >>>>>>>>> Once again, thank you so much for your help.
> > > >>>>>>>>>
> > > >>>>>>>>> Best regards,
> > > >>>>>>>>> TengYao
> > > >>>>>>>>>
> > > >>>>>>>>> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三
> 下午11:14寫道:
> > > >>>>>>>>>
> > > >>>>>>>>>> hi Apoorv
> > > >>>>>>>>>>
> > > >>>>>>>>>>> As the memberId is now known to the client, and client
> might
> > > >>> send
> > > >>>>> the
> > > >>>>>>>>>> leave
> > > >>>>>>>>>> group heartbeat on shutdown prior to receiving the initial
> > > >>>> heartbeat
> > > >>>>>>>>>> response. If that's true then how do we guarantee that the 2
> > > >>>>> requests
> > > >>>>>>> to
> > > >>>>>>>>>> join and leave will be processed in order, which could still
> > > >>> leave
> > > >>>>>>> stale
> > > >>>>>>>>>> members or throw unknown member id exceptions?
> > > >>>>>>>>>>
> > > >>>>>>>>>> This is definitely a good question. the short answer: no
> > > >>> guarantee
> > > >>>>> but
> > > >>>>>>>>>> best
> > > >>>>>>>>>> efforts
> > > >>>>>>>>>>
> > > >>>>>>>>>> Please notice the root cause is "we have no enough time to
> > > >> wait
> > > >>>>>>> member id
> > > >>>>>>>>>> (response) when closing consumer". Sadly, we can' guarantee
> > > >> the
> > > >>>>>>> request
> > > >>>>>>>>>> order due to the same reason.
> > > >>>>>>>>>>
> > > >>>>>>>>>> However, in contrast to previous behavior, there is one big
> > > >>>> benefit
> > > >>>>>>> of new
> > > >>>>>>>>>> approach - we can try STONITH because we know the member id
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Chia-Ping
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三
> > > >>>> 下午8:55寫道:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi TengYao,
> > > >>>>>>>>>>> Thanks for the KIP. Continuing on the point which Andrew
> > > >>>> mentioned
> > > >>>>> as
> > > >>>>>>>>>> AS1.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> As the memberId is now known to the client, and client
> might
> > > >>> send
> > > >>>>> the
> > > >>>>>>>>>> leave
> > > >>>>>>>>>>> group heartbeat on shutdown prior to receiving the initial
> > > >>>>> heartbeat
> > > >>>>>>>>>>> response. If that's true then how do we guarantee that the
> 2
> > > >>>>>>> requests to
> > > >>>>>>>>>>> join and leave will be processed in order, which could
> still
> > > >>>> leave
> > > >>>>>>> stale
> > > >>>>>>>>>>> members or throw unknown member id exceptions?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Though the client side member id generation is helpful
> which
> > > >>> will
> > > >>>>>>>>>> represent
> > > >>>>>>>>>>> the same group perspective as from client and broker's end.
> > > >>> But I
> > > >>>>>>> think
> > > >>>>>>>>>> the
> > > >>>>>>>>>>> major concern we want to solve here is Stale Partition
> > > >>>> Assignments
> > > >>>>>>> which
> > > >>>>>>>>>>> might still exist with the new approach. I am leaning
> towards
> > > >>> the
> > > >>>>>>>>>>> suggestion mentioned by Andrew where partition assignment
> > > >>>> triggers
> > > >>>>> on
> > > >>>>>>>>>>> subsequent heartbeat when client acknowledges the initial
> > > >>>>> heartbeat,
> > > >>>>>>>>>>> delayed partition assignment.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Though on a separate note, I have a different question.
> What
> > > >>>>> happens
> > > >>>>>>>>>> when
> > > >>>>>>>>>>> there is an issue with the client which sends the initial
> > > >>>> heartbeat
> > > >>>>>>>>>> without
> > > >>>>>>>>>>> memberId, then crashes and restarts? I think we must be
> > > >>>>> re-triggering
> > > >>>>>>>>>>> assignments and expiring members only after the heartbeat
> > > >>> session
> > > >>>>>>>>>> timeout?
> > > >>>>>>>>>>> If that's true then shall delayed partition assignment can
> > > >> help
> > > >>>>>>> benefit
> > > >>>>>>>>>> us
> > > >>>>>>>>>>> from this situation as well?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regards,
> > > >>>>>>>>>>> Apoorv Mittal
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot
> > > >>>>>>>>>> <dja...@confluent.io.invalid>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi Andrew,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Personally, I don't like the lobby approach. It makes
> things
> > > >>>> more
> > > >>>>>>>>>>>> complicated and it would require changing the records on
> the
> > > >>>>> server
> > > >>>>>>>>>> too.
> > > >>>>>>>>>>>> This is why I initially suggested the rejected alternative
> > > >> #2
> > > >>>>> which
> > > >>>>>>> is
> > > >>>>>>>>>>>> pretty close but also not perfect.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat
> > > >> API
> > > >>>>>>> already
> > > >>>>>>>>>>>> supports generating the member id on the client so we
> don't
> > > >>> need
> > > >>>>> any
> > > >>>>>>>>>>>> conditional logic on the client side. This is actually
> what
> > > >> we
> > > >>>>>>> wanted
> > > >>>>>>>>>> to
> > > >>>>>>>>>>> do
> > > >>>>>>>>>>>> in the first place but the idea got pushed back by Magnus
> > > >> back
> > > >>>>> then
> > > >>>>>>>>>>> because
> > > >>>>>>>>>>>> generating uuid from librdkafka required a new dependency.
> > > >> It
> > > >>>>> turns
> > > >>>>>>>>>> out
> > > >>>>>>>>>>>> that librdkafka has that dependency today. In retrospect,
> we
> > > >>>>> should
> > > >>>>>>>>>> have
> > > >>>>>>>>>>>> pushed back on this. Long story short, we can just do it.
> > > >> The
> > > >>>>>>>>>> proposal in
> > > >>>>>>>>>>>> this KIP is to make the member id required in future
> > > >> versions.
> > > >>>> We
> > > >>>>>>>>>> could
> > > >>>>>>>>>>>> also decide not to do it and to keep supporting both
> > > >>>> approaches. I
> > > >>>>>>>>>> would
> > > >>>>>>>>>>>> also be fine with this.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best,
> > > >>>>>>>>>>>> David
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield <
> > > >>>>>>>>>>>> andrew_schofi...@live.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi TengYao,
> > > >>>>>>>>>>>>> Thanks for your response. I’ll have just one more try to
> > > >>>>> persuade.
> > > >>>>>>>>>>>>> I feel that I will need to follow the approach with
> KIP-932
> > > >>>> when
> > > >>>>>>>>>> we’ve
> > > >>>>>>>>>>>>> made a decision, so I do have more than a passing
> interest
> > > >> in
> > > >>>>> this.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> A group member in the lobby is in the group, but it does
> > > >> not
> > > >>>> have
> > > >>>>>>>>>> any
> > > >>>>>>>>>>>>> assignments. A member of a consumer group can have no
> > > >>> assigned
> > > >>>>>>>>>>>>> partitions (such as 5 CG members subscribed to a topic
> > > >> with 4
> > > >>>>>>>>>>>> partitions),
> > > >>>>>>>>>>>>> so it’s a situation that consumer group members already
> > > >>> expect.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> One of Kafka’s strengths is the way that we handle API
> > > >>>>> versioning.
> > > >>>>>>>>>>>>> But, there is a cost - the behaviour is different
> depending
> > > >>> on
> > > >>>>> the
> > > >>>>>>>>>> RPC
> > > >>>>>>>>>>>>> version. KIP-848 is on the cusp of completion, but we’re
> > > >>>> already
> > > >>>>>>>>>> adding
> > > >>>>>>>>>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat.
> > > >>> That’s
> > > >>>> a
> > > >>>>>>>>>> pity.
> > > >>>>>>>>>>>>> Only a minor issue, but it’s unfortunate.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>> Andrew
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi <
> > > >> kiting...@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hello Andrew
> > > >>>>>>>>>>>>>> Thank you for your thoughtful suggestions and getting
> the
> > > >>>>>>>>>> discussion
> > > >>>>>>>>>>>>> going.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> To AS1:
> > > >>>>>>>>>>>>>> In the current scenario where the server generates the
> > > >> UUID,
> > > >>>> if
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>> client
> > > >>>>>>>>>>>>>> shuts down before receiving the memberId generated by
> the
> > > >> GC
> > > >>>>>>>>>>>> (regardless
> > > >>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>> whether it’s a graceful shutdown or not), the GC will
> > > >> still
> > > >>>> have
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> wait
> > > >>>>>>>>>>>>>> for the heartbeat timeout because the client doesn’t
> know
> > > >>> its
> > > >>>>>>>>>>> memberId.
> > > >>>>>>>>>>>>>> This KIP indeed cannot completely resolve the
> idempotency
> > > >>>> issue,
> > > >>>>>>>>>> but
> > > >>>>>>>>>>> it
> > > >>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>> better handle shutdown scenarios under normal
> > > >> circumstances
> > > >>>>>>>>>> because
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> client always knows its memberId. Even if the client
> shuts
> > > >>>> down
> > > >>>>>>>>>>>>> immediately
> > > >>>>>>>>>>>>>> after the initial heartbeat, as long as it performs a
> > > >>> graceful
> > > >>>>>>>>>>> shutdown
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> sends a leave heartbeat, the GC can manage the situation
> > > >> and
> > > >>>>>>>>>> remove
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> member. Therefore, the goal of this KIP is to address
> the
> > > >>>> issue
> > > >>>>>>>>>> where
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> GC has to wait for the heartbeat timeout due to the
> client
> > > >>>>> leaving
> > > >>>>>>>>>>>>> without
> > > >>>>>>>>>>>>>> knowing its memberId, which leads to reduced throughput
> > > >> and
> > > >>>>>>>>>> limited
> > > >>>>>>>>>>>>>> scalability.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> The solution you suggest has also been proposed by
> David.
> > > >>> The
> > > >>>>>>>>>> concern
> > > >>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>> this approach is that it introduces additional
> complexity
> > > >>> for
> > > >>>>>>>>>>>>>> compatibility, as the new server would not immediately
> add
> > > >>> the
> > > >>>>>>>>>> member
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> the group, while the old server would. This requires
> > > >> clients
> > > >>>> to
> > > >>>>>>>>>>>>>> differentiate whether their memberId has been added to
> the
> > > >>>> group
> > > >>>>>>>>>> or
> > > >>>>>>>>>>>> not,
> > > >>>>>>>>>>>>>> which could result in unexpected logs.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Best Regards,
> > > >>>>>>>>>>>>>> TengYao
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於
> 2024年8月14日
> > > >>> 週三
> > > >>>>>>>>>>>> 上午12:29寫道:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Hi TengYao,
> > > >>>>>>>>>>>>>>> Thanks for the KIP. I wonder if there’s a different way
> > > >> to
> > > >>>>> close
> > > >>>>>>>>>>> what
> > > >>>>>>>>>>>>>>> is quite a small window.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> AS1: It is true that the initial heartbeat is not
> > > >>> idempotent,
> > > >>>>> but
> > > >>>>>>>>>>> this
> > > >>>>>>>>>>>>>>> remains
> > > >>>>>>>>>>>>>>> true with this KIP. It’s just differently not
> idempotent.
> > > >>> If
> > > >>>>> the
> > > >>>>>>>>>>>> client
> > > >>>>>>>>>>>>>>> makes its
> > > >>>>>>>>>>>>>>> own member ID, sends a request and dies, the GC will
> > > >> still
> > > >>>> have
> > > >>>>>>>>>>> added
> > > >>>>>>>>>>>>>>> the member to the group and it will hang around until
> the
> > > >>>>> session
> > > >>>>>>>>>>>>> expires.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> I wonder if the GC could still generate the member ID
> in
> > > >>>>>>>>>> response to
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> first
> > > >>>>>>>>>>>>>>> heartbeat, and put the member in a special PENDING
> state
> > > >>> with
> > > >>>>> no
> > > >>>>>>>>>>>>>>> assignments until the client sends the next heartbeat,
> > > >> thus
> > > >>>>>>>>>>> confirming
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>> has received the member ID. This would not be a
> protocol
> > > >>>> change
> > > >>>>>>>>>> at
> > > >>>>>>>>>>>> all,
> > > >>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>> a change to the GC to keep a new member in the lobby
> > > >> until
> > > >>>> it’s
> > > >>>>>>>>>>>>> comfirmed
> > > >>>>>>>>>>>>>>> it knows its member ID.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>> Andrew
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi <
> > > >>> kiting...@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Hi Chia-Ping,
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Thanks for review and suggestions.
> > > >>>>>>>>>>>>>>>> I have updated the content of KIP accordingly.
> > > >>>>>>>>>>>>>>>> Please take a look.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Best regards,
> > > >>>>>>>>>>>>>>>> TengYao
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二
> > > >>>>> 下午9:45寫道:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> hi TengYao
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> thanks for this KIP.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 1) could you please describe the before/after
> behavior
> > > >> in
> > > >>>> the
> > > >>>>>>>>>>>>> "Proposed
> > > >>>>>>>>>>>>>>>>> Changes" section? IIRC, current RPC allows HB having
> > > >>> member
> > > >>>>> id
> > > >>>>>>>>>>>>>>> generated by
> > > >>>>>>>>>>>>>>>>> client, right? If HB has no member ID, server will
> > > >>> generate
> > > >>>>> one
> > > >>>>>>>>>>> and
> > > >>>>>>>>>>>>> then
> > > >>>>>>>>>>>>>>>>> return. The new behavior will enforce HB "must" have
> > > >>> member
> > > >>>>> ID.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 2) could you please write the version number
> explicitly
> > > >>> in
> > > >>>>> the
> > > >>>>>>>>>> KIP
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 3) how new client code handle the old HB? Does it
> > > >> always
> > > >>>>>>>>>> generate
> > > >>>>>>>>>>>>> member
> > > >>>>>>>>>>>>>>>>> ID on client-side even though that is not restricted?
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>> Chia-Ping
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote:
> > > >>>>>>>>>>>>>>>>>> Hello everyone,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> I would like to start a discussion thread on
> KIP-1082,
> > > >>>> which
> > > >>>>>>>>>>>> proposes
> > > >>>>>>>>>>>>>>>>>> enabling id generation for clients over the
> > > >>>>>>>>>>> ConsumerGroupHeartbeat
> > > >>>>>>>>>>>>> RPC.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Here is the KIP Link: KIP-1082
> > > >>>>>>>>>>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Please take a look and let me know what you think,
> > > >> and I
> > > >>>>> would
> > > >>>>>>>>>>>>>>> appreciate
> > > >>>>>>>>>>>>>>>>>> any suggestions and feedback.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Best regards,
> > > >>>>>>>>>>>>>>>>>> TengYao
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to