Hi TengYao,

Thanks for the update. I haven't fully read it yet but I will soon.

LM4: This is incorrect. The consumer must keep its member id during its
entire lifetime (until the process stops or dies). The protocol stipulates
that a member must rejoin with the same member id and the member epoch set
to zero when an FENCED_MEMBER_EPOCH occurs. This allows the member to
resynchronize itself. We should not change this behavior. I think that we
should see the client side generation id as an incarnation id of the
application. It is generated once and kept until it stops or dies.

Best,
David

On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi <kiting...@gmail.com> wrote:

> Hello Lianet !
>
> Thanks for the reviews and suggestions!
>
> LM1: Indeed, we plan to enforce client-side ID generation in the future,
> and it is not an alternative. I will change the title accordingly.
>
> LM2: Yes, that's the expectation. I will add that statement to the public
> interface section.
>
> LM3: Thank you for the high-level perspective review. I think you're right;
> our intention isn't very clear since it was placed at the end of the
> section. I will try to rephrase that section to make it more obvious.
>
> LM4: Regarding the definition of "session" in this KIP, I believe it refers
> to the period between the *first-time heartbeat* and when the *consumer
> leaves the group* (whether through a graceful shutdown or a heartbeat
> timeout). The consumer should reuse its UUID if it has been generated
> before. The only situation in which it will regenerate the UUID is if the
> coordinator finds that there is already a consumer with the same UUID.
> IIRC, the coordinator should compare the member epochs, and the
> later-joined consumer should be fenced off by the coordinator due to having
> a lower member epoch. Once the consumer receives a `FENCED_MEMBER_EPOCH`
> error, it will generate a new UUID and attempt to rejoin. I will clarify
> this in the KIP.
>
> Thanks again for your reviews, I really appreciate it.
>
> Best regards,
> TengYao
>
> Lianet M. <liane...@gmail.com> 於 2024年8月28日 週三 下午7:12寫道:
>
> > Hello TengYao! Thanks for taking on this issue, we've been going around
> it
> > for a while.
> >
> > LM1: About the title of the KIP: "Enable ID Generation for Clients over
> the
> > ConsumerGroupHeartbeat RPC". I find it confusing because it hints that
> > we're adding it as an alternative (which was discussed and discarded, in
> > favour of really enforcing it). It's also missing the core change imo,
> > which is "where" the generation happens. So, maybe more to the point with
> > something along the lines of "Client-side generated ID for clients over
> > ConsumerGroupHeartbeat RPC"?
> >
> > LM2: On the public interfaces section, the KIP states that "the server
> will
> > reject the request", but we should agree on the specific error type. I
> > expect it should fail with an InvalidRequestException, is that the
> > intention? (This was also suggested in the discussion thread before but
> is
> > not in the KIP).
> >
> > LM3. Related to my previous point, I find that to be the true
> public-facing
> > change (member ID mandatory at the protocol level), but it's only at the
> > end of the Public interfaces changes, kind of lost among details of how
> > we're going to do it. Should we rephrase that section with the actual
> > change first, and the hows after (ex. Bumping the version is not the
> > public-facing change in this case, it's just the mechanism to properly
> > introduce our change)
> >
> > LM4. Regarding the lifetime of the UUID: the KIP states we will "Verify
> > that the UUID remains consistent across all subsequent heartbeats during
> > the session". What is this "session" referring to here? I would expect
> that
> > the UUID is associated to a consumer instance (generated for the consumer
> > the first time it needs to send a HB if it doesn't have the UUID yet.
> From
> > there on, every time it needs to send a "first HB" again, it will reuse
> its
> > UUID, is that the intention? Note that we should consider that the same
> > consumer instance may have many "first heartbeats", meaning heartbeats to
> > join the group when it's not part of it (ex. consumer unsubscribe +
> > subscribe, fenced, stale). Is this the intention or are you considering
> the
> > lifetime differently? We should clarify it in the KIP.
> >
> > Thanks!
> >
> > Lianet
> >
> > On Tue, Aug 27, 2024 at 2:27 AM TengYao Chi <kiting...@gmail.com> wrote:
> >
> > > Hi everyone,
> > >
> > > I have revised this KIP multiple times based on the feedback from our
> > > discussions.
> > > I would greatly appreciate it if you could review it when you have the
> > > time.
> > > If there are no further comments or suggestions, I plan to proceed with
> > > initiating a vote soon.
> > >
> > > Best regards,
> > > TengYao
> > >
> > > TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道:
> > >
> > > > Hi Andrew,
> > > > Thank you for your previous feedback and insights.
> > > > Your contributions have added valuable perspectives to the
> discussions.
> > > > And we also benefit from the comparison of different solutions.
> > > > I’m also looking forward to seeing an initial version in KIP-932, as
> it
> > > > will provide a good reference for future implementations.
> > > >
> > > > Regarding your comment on AS2, I wanted to clarify that my
> > specification
> > > > references org.apache.kafka.common.Uuid.
> > > > I believe we’re referring to the same class, and it might just be a
> > small
> > > > oversight due to the busy schedule.
> > > >
> > > > I want to express my gratitude once again for your many insightful
> > > > comments, which have helped the discussion progress smoothly.
> > > >
> > > > Best regards,
> > > > TengYao
> > > >
> > > >
> > > > Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 週四
> > 下午11:28寫道:
> > > >
> > > >> Hi TengYao,
> > > >> I’ve been reading through the comments and I’m happy that the lobby
> > > >> approach has not gained support.
> > > >>
> > > >> Assuming that this KIP is voted, I will be happy to change KIP-932
> so
> > > >> that it only supports client-side member ID generation. Because that
> > KIP
> > > >> is still
> > > >> under development, I can do this in the first version of
> > > >> ShareGroupHeartbeat.
> > > >>
> > > >> AS2: For the encoding section, I suppose the specific encoding which
> > > >> is used is what org.apache.kafka.utils.Uuid uses.
> > > >>
> > > >> Thanks,
> > > >> Andrew
> > > >>
> > > >> > On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com>
> wrote:
> > > >> >
> > > >> > Hello Apoorv,
> > > >> > Thank you for your feedback.
> > > >> > Regarding the questions you raised, unfortunately, this KIP cannot
> > > >> > guarantee the order of heartbeats. As with many classic
> distributed
> > > >> system
> > > >> > challenges, what we can do is make our best effort to ensure that
> > > there
> > > >> are
> > > >> > no idle members or stale assignments under normal circumstances.
> > > >> >
> > > >> > As for the lobby approach, I’m not a fan of it because it requires
> > > >> adding a
> > > >> > mechanism to maintain client state within the ConsumerGroup,
> which,
> > in
> > > >> my
> > > >> > view, resembles something like a two-phase commit. This would
> > > introduce
> > > >> > more complexity than the proposal in this KIP, which is something
> we
> > > >> want
> > > >> > to avoid. KIP-848 aims to simplify the existing protocol, and
> while
> > > the
> > > >> > lobby approach is a good one, I believe it is not the right fit
> for
> > > this
> > > >> > particular situation.
> > > >> >
> > > >> > Best regards,
> > > >> > TengYao
> > > >> >
> > > >> > TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道:
> > > >> >
> > > >> >> Hi David,
> > > >> >>
> > > >> >> I really appreciate your review and suggestions. As I am still
> > > gaining
> > > >> >> experience in writing KIPs, your input has been incredibly
> > helpful. I
> > > >> am
> > > >> >> currently applying your suggestions to the KIP and will complete
> it
> > > as
> > > >> soon
> > > >> >> as possible.
> > > >> >> Regarding the UUID part, I think we haven’t reached a conclusion
> > > >> yet.(So
> > > >> >> far according to this thread)
> > > >> >> However, I will review the current implementation in the Kafka
> > `Uuid`
> > > >> >> class and include a brief specification in the KIP.
> > > >> >>
> > > >> >> Once again, thank you so much for your help.
> > > >> >>
> > > >> >> Best regards,
> > > >> >> TengYao
> > > >> >>
> > > >> >> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 下午11:14寫道:
> > > >> >>
> > > >> >>> hi Apoorv
> > > >> >>>
> > > >> >>>> As the memberId is now known to the client, and client might
> send
> > > the
> > > >> >>> leave
> > > >> >>> group heartbeat on shutdown prior to receiving the initial
> > heartbeat
> > > >> >>> response. If that's true then how do we guarantee that the 2
> > > requests
> > > >> to
> > > >> >>> join and leave will be processed in order, which could still
> leave
> > > >> stale
> > > >> >>> members or throw unknown member id exceptions?
> > > >> >>>
> > > >> >>> This is definitely a good question. the short answer: no
> guarantee
> > > but
> > > >> >>> best
> > > >> >>> efforts
> > > >> >>>
> > > >> >>> Please notice the root cause is "we have no enough time to wait
> > > >> member id
> > > >> >>> (response) when closing consumer". Sadly, we can' guarantee the
> > > >> request
> > > >> >>> order due to the same reason.
> > > >> >>>
> > > >> >>> However, in contrast to previous behavior, there is one big
> > benefit
> > > >> of new
> > > >> >>> approach - we can try STONITH because we know the member id
> > > >> >>>
> > > >> >>> Best,
> > > >> >>> Chia-Ping
> > > >> >>>
> > > >> >>>
> > > >> >>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三
> > 下午8:55寫道:
> > > >> >>>
> > > >> >>>> Hi TengYao,
> > > >> >>>> Thanks for the KIP. Continuing on the point which Andrew
> > mentioned
> > > as
> > > >> >>> AS1.
> > > >> >>>>
> > > >> >>>> As the memberId is now known to the client, and client might
> send
> > > the
> > > >> >>> leave
> > > >> >>>> group heartbeat on shutdown prior to receiving the initial
> > > heartbeat
> > > >> >>>> response. If that's true then how do we guarantee that the 2
> > > >> requests to
> > > >> >>>> join and leave will be processed in order, which could still
> > leave
> > > >> stale
> > > >> >>>> members or throw unknown member id exceptions?
> > > >> >>>>
> > > >> >>>> Though the client side member id generation is helpful which
> will
> > > >> >>> represent
> > > >> >>>> the same group perspective as from client and broker's end.
> But I
> > > >> think
> > > >> >>> the
> > > >> >>>> major concern we want to solve here is Stale Partition
> > Assignments
> > > >> which
> > > >> >>>> might still exist with the new approach. I am leaning towards
> the
> > > >> >>>> suggestion mentioned by Andrew where partition assignment
> > triggers
> > > on
> > > >> >>>> subsequent heartbeat when client acknowledges the initial
> > > heartbeat,
> > > >> >>>> delayed partition assignment.
> > > >> >>>>
> > > >> >>>> Though on a separate note, I have a different question. What
> > > happens
> > > >> >>> when
> > > >> >>>> there is an issue with the client which sends the initial
> > heartbeat
> > > >> >>> without
> > > >> >>>> memberId, then crashes and restarts? I think we must be
> > > re-triggering
> > > >> >>>> assignments and expiring members only after the heartbeat
> session
> > > >> >>> timeout?
> > > >> >>>> If that's true then shall delayed partition assignment can help
> > > >> benefit
> > > >> >>> us
> > > >> >>>> from this situation as well?
> > > >> >>>>
> > > >> >>>> Regards,
> > > >> >>>> Apoorv Mittal
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot
> > > >> >>> <dja...@confluent.io.invalid>
> > > >> >>>> wrote:
> > > >> >>>>
> > > >> >>>>> Hi Andrew,
> > > >> >>>>>
> > > >> >>>>> Personally, I don't like the lobby approach. It makes things
> > more
> > > >> >>>>> complicated and it would require changing the records on the
> > > server
> > > >> >>> too.
> > > >> >>>>> This is why I initially suggested the rejected alternative #2
> > > which
> > > >> is
> > > >> >>>>> pretty close but also not perfect.
> > > >> >>>>>
> > > >> >>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat API
> > > >> already
> > > >> >>>>> supports generating the member id on the client so we don't
> need
> > > any
> > > >> >>>>> conditional logic on the client side. This is actually what we
> > > >> wanted
> > > >> >>> to
> > > >> >>>> do
> > > >> >>>>> in the first place but the idea got pushed back by Magnus back
> > > then
> > > >> >>>> because
> > > >> >>>>> generating uuid from librdkafka required a new dependency. It
> > > turns
> > > >> >>> out
> > > >> >>>>> that librdkafka has that dependency today. In retrospect, we
> > > should
> > > >> >>> have
> > > >> >>>>> pushed back on this. Long story short, we can just do it. The
> > > >> >>> proposal in
> > > >> >>>>> this KIP is to make the member id required in future versions.
> > We
> > > >> >>> could
> > > >> >>>>> also decide not to do it and to keep supporting both
> > approaches. I
> > > >> >>> would
> > > >> >>>>> also be fine with this.
> > > >> >>>>>
> > > >> >>>>> Best,
> > > >> >>>>> David
> > > >> >>>>>
> > > >> >>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield <
> > > >> >>>>> andrew_schofi...@live.com>
> > > >> >>>>> wrote:
> > > >> >>>>>
> > > >> >>>>>> Hi TengYao,
> > > >> >>>>>> Thanks for your response. I’ll have just one more try to
> > > persuade.
> > > >> >>>>>> I feel that I will need to follow the approach with KIP-932
> > when
> > > >> >>> we’ve
> > > >> >>>>>> made a decision, so I do have more than a passing interest in
> > > this.
> > > >> >>>>>>
> > > >> >>>>>> A group member in the lobby is in the group, but it does not
> > have
> > > >> >>> any
> > > >> >>>>>> assignments. A member of a consumer group can have no
> assigned
> > > >> >>>>>> partitions (such as 5 CG members subscribed to a topic with 4
> > > >> >>>>> partitions),
> > > >> >>>>>> so it’s a situation that consumer group members already
> expect.
> > > >> >>>>>>
> > > >> >>>>>> One of Kafka’s strengths is the way that we handle API
> > > versioning.
> > > >> >>>>>> But, there is a cost - the behaviour is different depending
> on
> > > the
> > > >> >>> RPC
> > > >> >>>>>> version. KIP-848 is on the cusp of completion, but we’re
> > already
> > > >> >>> adding
> > > >> >>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat.
> That’s
> > a
> > > >> >>> pity.
> > > >> >>>>>> Only a minor issue, but it’s unfortunate.
> > > >> >>>>>>
> > > >> >>>>>> Thanks,
> > > >> >>>>>> Andrew
> > > >> >>>>>>
> > > >> >>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi <kiting...@gmail.com>
> > > >> >>> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>> Hello Andrew
> > > >> >>>>>>> Thank you for your thoughtful suggestions and getting the
> > > >> >>> discussion
> > > >> >>>>>> going.
> > > >> >>>>>>>
> > > >> >>>>>>> To AS1:
> > > >> >>>>>>> In the current scenario where the server generates the UUID,
> > if
> > > >> >>> the
> > > >> >>>>>> client
> > > >> >>>>>>> shuts down before receiving the memberId generated by the GC
> > > >> >>>>> (regardless
> > > >> >>>>>> of
> > > >> >>>>>>> whether it’s a graceful shutdown or not), the GC will still
> > have
> > > >> >>> to
> > > >> >>>>> wait
> > > >> >>>>>>> for the heartbeat timeout because the client doesn’t know
> its
> > > >> >>>> memberId.
> > > >> >>>>>>> This KIP indeed cannot completely resolve the idempotency
> > issue,
> > > >> >>> but
> > > >> >>>> it
> > > >> >>>>>> can
> > > >> >>>>>>> better handle shutdown scenarios under normal circumstances
> > > >> >>> because
> > > >> >>>> the
> > > >> >>>>>>> client always knows its memberId. Even if the client shuts
> > down
> > > >> >>>>>> immediately
> > > >> >>>>>>> after the initial heartbeat, as long as it performs a
> graceful
> > > >> >>>> shutdown
> > > >> >>>>>> and
> > > >> >>>>>>> sends a leave heartbeat, the GC can manage the situation and
> > > >> >>> remove
> > > >> >>>> the
> > > >> >>>>>>> member. Therefore, the goal of this KIP is to address the
> > issue
> > > >> >>> where
> > > >> >>>>> the
> > > >> >>>>>>> GC has to wait for the heartbeat timeout due to the client
> > > leaving
> > > >> >>>>>> without
> > > >> >>>>>>> knowing its memberId, which leads to reduced throughput and
> > > >> >>> limited
> > > >> >>>>>>> scalability.
> > > >> >>>>>>>
> > > >> >>>>>>> The solution you suggest has also been proposed by David.
> The
> > > >> >>> concern
> > > >> >>>>>> with
> > > >> >>>>>>> this approach is that it introduces additional complexity
> for
> > > >> >>>>>>> compatibility, as the new server would not immediately add
> the
> > > >> >>> member
> > > >> >>>>> to
> > > >> >>>>>>> the group, while the old server would. This requires clients
> > to
> > > >> >>>>>>> differentiate whether their memberId has been added to the
> > group
> > > >> >>> or
> > > >> >>>>> not,
> > > >> >>>>>>> which could result in unexpected logs.
> > > >> >>>>>>>
> > > >> >>>>>>> Best Regards,
> > > >> >>>>>>> TengYao
> > > >> >>>>>>>
> > > >> >>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月14日
> 週三
> > > >> >>>>> 上午12:29寫道:
> > > >> >>>>>>>
> > > >> >>>>>>>> Hi TengYao,
> > > >> >>>>>>>> Thanks for the KIP. I wonder if there’s a different way to
> > > close
> > > >> >>>> what
> > > >> >>>>>>>> is quite a small window.
> > > >> >>>>>>>>
> > > >> >>>>>>>> AS1: It is true that the initial heartbeat is not
> idempotent,
> > > but
> > > >> >>>> this
> > > >> >>>>>>>> remains
> > > >> >>>>>>>> true with this KIP. It’s just differently not idempotent.
> If
> > > the
> > > >> >>>>> client
> > > >> >>>>>>>> makes its
> > > >> >>>>>>>> own member ID, sends a request and dies, the GC will still
> > have
> > > >> >>>> added
> > > >> >>>>>>>> the member to the group and it will hang around until the
> > > session
> > > >> >>>>>> expires.
> > > >> >>>>>>>>
> > > >> >>>>>>>> I wonder if the GC could still generate the member ID in
> > > >> >>> response to
> > > >> >>>>> the
> > > >> >>>>>>>> first
> > > >> >>>>>>>> heartbeat, and put the member in a special PENDING state
> with
> > > no
> > > >> >>>>>>>> assignments until the client sends the next heartbeat, thus
> > > >> >>>> confirming
> > > >> >>>>>> it
> > > >> >>>>>>>> has received the member ID. This would not be a protocol
> > change
> > > >> >>> at
> > > >> >>>>> all,
> > > >> >>>>>>>> just
> > > >> >>>>>>>> a change to the GC to keep a new member in the lobby until
> > it’s
> > > >> >>>>>> comfirmed
> > > >> >>>>>>>> it knows its member ID.
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>> Thanks,
> > > >> >>>>>>>> Andrew
> > > >> >>>>>>>>
> > > >> >>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi <
> kiting...@gmail.com>
> > > >> >>> wrote:
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Hi Chia-Ping,
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Thanks for review and suggestions.
> > > >> >>>>>>>>> I have updated the content of KIP accordingly.
> > > >> >>>>>>>>> Please take a look.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Best regards,
> > > >> >>>>>>>>> TengYao
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二
> > > 下午9:45寫道:
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> hi TengYao
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> thanks for this KIP.
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> 1) could you please describe the before/after behavior in
> > the
> > > >> >>>>>> "Proposed
> > > >> >>>>>>>>>> Changes" section? IIRC, current RPC allows HB having
> member
> > > id
> > > >> >>>>>>>> generated by
> > > >> >>>>>>>>>> client, right? If HB has no member ID, server will
> generate
> > > one
> > > >> >>>> and
> > > >> >>>>>> then
> > > >> >>>>>>>>>> return. The new behavior will enforce HB "must" have
> member
> > > ID.
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> 2) could you please write the version number explicitly
> in
> > > the
> > > >> >>> KIP
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> 3) how new client code handle the old HB? Does it always
> > > >> >>> generate
> > > >> >>>>>> member
> > > >> >>>>>>>>>> ID on client-side even though that is not restricted?
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> Best,
> > > >> >>>>>>>>>> Chia-Ping
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote:
> > > >> >>>>>>>>>>> Hello everyone,
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> I would like to start a discussion thread on KIP-1082,
> > which
> > > >> >>>>> proposes
> > > >> >>>>>>>>>>> enabling id generation for clients over the
> > > >> >>>> ConsumerGroupHeartbeat
> > > >> >>>>>> RPC.
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Here is the KIP Link: KIP-1082
> > > >> >>>>>>>>>>> <
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Please take a look and let me know what you think, and I
> > > would
> > > >> >>>>>>>> appreciate
> > > >> >>>>>>>>>>> any suggestions and feedback.
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Best regards,
> > > >> >>>>>>>>>>> TengYao
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > >
> >
>

Reply via email to