Hi everyone,

I have revised this KIP multiple times based on the feedback from our
discussions.
I would greatly appreciate it if you could review it when you have the
time.
If there are no further comments or suggestions, I plan to proceed with
initiating a vote soon.

Best regards,
TengYao

TengYao Chi <kiting...@gmail.com> 於 2024年8月23日 週五 下午2:43寫道:

> Hi Andrew,
> Thank you for your previous feedback and insights.
> Your contributions have added valuable perspectives to the discussions.
> And we also benefit from the comparison of different solutions.
> I’m also looking forward to seeing an initial version in KIP-932, as it
> will provide a good reference for future implementations.
>
> Regarding your comment on AS2, I wanted to clarify that my specification
> references org.apache.kafka.common.Uuid.
> I believe we’re referring to the same class, and it might just be a small
> oversight due to the busy schedule.
>
> I want to express my gratitude once again for your many insightful
> comments, which have helped the discussion progress smoothly.
>
> Best regards,
> TengYao
>
>
> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月22日 週四 下午11:28寫道:
>
>> Hi TengYao,
>> I’ve been reading through the comments and I’m happy that the lobby
>> approach has not gained support.
>>
>> Assuming that this KIP is voted, I will be happy to change KIP-932 so
>> that it only supports client-side member ID generation. Because that KIP
>> is still
>> under development, I can do this in the first version of
>> ShareGroupHeartbeat.
>>
>> AS2: For the encoding section, I suppose the specific encoding which
>> is used is what org.apache.kafka.utils.Uuid uses.
>>
>> Thanks,
>> Andrew
>>
>> > On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com> wrote:
>> >
>> > Hello Apoorv,
>> > Thank you for your feedback.
>> > Regarding the questions you raised, unfortunately, this KIP cannot
>> > guarantee the order of heartbeats. As with many classic distributed
>> system
>> > challenges, what we can do is make our best effort to ensure that there
>> are
>> > no idle members or stale assignments under normal circumstances.
>> >
>> > As for the lobby approach, I’m not a fan of it because it requires
>> adding a
>> > mechanism to maintain client state within the ConsumerGroup, which, in
>> my
>> > view, resembles something like a two-phase commit. This would introduce
>> > more complexity than the proposal in this KIP, which is something we
>> want
>> > to avoid. KIP-848 aims to simplify the existing protocol, and while the
>> > lobby approach is a good one, I believe it is not the right fit for this
>> > particular situation.
>> >
>> > Best regards,
>> > TengYao
>> >
>> > TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道:
>> >
>> >> Hi David,
>> >>
>> >> I really appreciate your review and suggestions. As I am still gaining
>> >> experience in writing KIPs, your input has been incredibly helpful. I
>> am
>> >> currently applying your suggestions to the KIP and will complete it as
>> soon
>> >> as possible.
>> >> Regarding the UUID part, I think we haven’t reached a conclusion
>> yet.(So
>> >> far according to this thread)
>> >> However, I will review the current implementation in the Kafka `Uuid`
>> >> class and include a brief specification in the KIP.
>> >>
>> >> Once again, thank you so much for your help.
>> >>
>> >> Best regards,
>> >> TengYao
>> >>
>> >> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 下午11:14寫道:
>> >>
>> >>> hi Apoorv
>> >>>
>> >>>> As the memberId is now known to the client, and client might send the
>> >>> leave
>> >>> group heartbeat on shutdown prior to receiving the initial heartbeat
>> >>> response. If that's true then how do we guarantee that the 2 requests
>> to
>> >>> join and leave will be processed in order, which could still leave
>> stale
>> >>> members or throw unknown member id exceptions?
>> >>>
>> >>> This is definitely a good question. the short answer: no guarantee but
>> >>> best
>> >>> efforts
>> >>>
>> >>> Please notice the root cause is "we have no enough time to wait
>> member id
>> >>> (response) when closing consumer". Sadly, we can' guarantee the
>> request
>> >>> order due to the same reason.
>> >>>
>> >>> However, in contrast to previous behavior, there is one big benefit
>> of new
>> >>> approach - we can try STONITH because we know the member id
>> >>>
>> >>> Best,
>> >>> Chia-Ping
>> >>>
>> >>>
>> >>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三 下午8:55寫道:
>> >>>
>> >>>> Hi TengYao,
>> >>>> Thanks for the KIP. Continuing on the point which Andrew mentioned as
>> >>> AS1.
>> >>>>
>> >>>> As the memberId is now known to the client, and client might send the
>> >>> leave
>> >>>> group heartbeat on shutdown prior to receiving the initial heartbeat
>> >>>> response. If that's true then how do we guarantee that the 2
>> requests to
>> >>>> join and leave will be processed in order, which could still leave
>> stale
>> >>>> members or throw unknown member id exceptions?
>> >>>>
>> >>>> Though the client side member id generation is helpful which will
>> >>> represent
>> >>>> the same group perspective as from client and broker's end. But I
>> think
>> >>> the
>> >>>> major concern we want to solve here is Stale Partition Assignments
>> which
>> >>>> might still exist with the new approach. I am leaning towards the
>> >>>> suggestion mentioned by Andrew where partition assignment triggers on
>> >>>> subsequent heartbeat when client acknowledges the initial heartbeat,
>> >>>> delayed partition assignment.
>> >>>>
>> >>>> Though on a separate note, I have a different question. What happens
>> >>> when
>> >>>> there is an issue with the client which sends the initial heartbeat
>> >>> without
>> >>>> memberId, then crashes and restarts? I think we must be re-triggering
>> >>>> assignments and expiring members only after the heartbeat session
>> >>> timeout?
>> >>>> If that's true then shall delayed partition assignment can help
>> benefit
>> >>> us
>> >>>> from this situation as well?
>> >>>>
>> >>>> Regards,
>> >>>> Apoorv Mittal
>> >>>>
>> >>>>
>> >>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot
>> >>> <dja...@confluent.io.invalid>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi Andrew,
>> >>>>>
>> >>>>> Personally, I don't like the lobby approach. It makes things more
>> >>>>> complicated and it would require changing the records on the server
>> >>> too.
>> >>>>> This is why I initially suggested the rejected alternative #2 which
>> is
>> >>>>> pretty close but also not perfect.
>> >>>>>
>> >>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat API
>> already
>> >>>>> supports generating the member id on the client so we don't need any
>> >>>>> conditional logic on the client side. This is actually what we
>> wanted
>> >>> to
>> >>>> do
>> >>>>> in the first place but the idea got pushed back by Magnus back then
>> >>>> because
>> >>>>> generating uuid from librdkafka required a new dependency. It turns
>> >>> out
>> >>>>> that librdkafka has that dependency today. In retrospect, we should
>> >>> have
>> >>>>> pushed back on this. Long story short, we can just do it. The
>> >>> proposal in
>> >>>>> this KIP is to make the member id required in future versions. We
>> >>> could
>> >>>>> also decide not to do it and to keep supporting both approaches. I
>> >>> would
>> >>>>> also be fine with this.
>> >>>>>
>> >>>>> Best,
>> >>>>> David
>> >>>>>
>> >>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield <
>> >>>>> andrew_schofi...@live.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hi TengYao,
>> >>>>>> Thanks for your response. I’ll have just one more try to persuade.
>> >>>>>> I feel that I will need to follow the approach with KIP-932 when
>> >>> we’ve
>> >>>>>> made a decision, so I do have more than a passing interest in this.
>> >>>>>>
>> >>>>>> A group member in the lobby is in the group, but it does not have
>> >>> any
>> >>>>>> assignments. A member of a consumer group can have no assigned
>> >>>>>> partitions (such as 5 CG members subscribed to a topic with 4
>> >>>>> partitions),
>> >>>>>> so it’s a situation that consumer group members already expect.
>> >>>>>>
>> >>>>>> One of Kafka’s strengths is the way that we handle API versioning.
>> >>>>>> But, there is a cost - the behaviour is different depending on the
>> >>> RPC
>> >>>>>> version. KIP-848 is on the cusp of completion, but we’re already
>> >>> adding
>> >>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat. That’s a
>> >>> pity.
>> >>>>>> Only a minor issue, but it’s unfortunate.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Andrew
>> >>>>>>
>> >>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi <kiting...@gmail.com>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>> Hello Andrew
>> >>>>>>> Thank you for your thoughtful suggestions and getting the
>> >>> discussion
>> >>>>>> going.
>> >>>>>>>
>> >>>>>>> To AS1:
>> >>>>>>> In the current scenario where the server generates the UUID, if
>> >>> the
>> >>>>>> client
>> >>>>>>> shuts down before receiving the memberId generated by the GC
>> >>>>> (regardless
>> >>>>>> of
>> >>>>>>> whether it’s a graceful shutdown or not), the GC will still have
>> >>> to
>> >>>>> wait
>> >>>>>>> for the heartbeat timeout because the client doesn’t know its
>> >>>> memberId.
>> >>>>>>> This KIP indeed cannot completely resolve the idempotency issue,
>> >>> but
>> >>>> it
>> >>>>>> can
>> >>>>>>> better handle shutdown scenarios under normal circumstances
>> >>> because
>> >>>> the
>> >>>>>>> client always knows its memberId. Even if the client shuts down
>> >>>>>> immediately
>> >>>>>>> after the initial heartbeat, as long as it performs a graceful
>> >>>> shutdown
>> >>>>>> and
>> >>>>>>> sends a leave heartbeat, the GC can manage the situation and
>> >>> remove
>> >>>> the
>> >>>>>>> member. Therefore, the goal of this KIP is to address the issue
>> >>> where
>> >>>>> the
>> >>>>>>> GC has to wait for the heartbeat timeout due to the client leaving
>> >>>>>> without
>> >>>>>>> knowing its memberId, which leads to reduced throughput and
>> >>> limited
>> >>>>>>> scalability.
>> >>>>>>>
>> >>>>>>> The solution you suggest has also been proposed by David. The
>> >>> concern
>> >>>>>> with
>> >>>>>>> this approach is that it introduces additional complexity for
>> >>>>>>> compatibility, as the new server would not immediately add the
>> >>> member
>> >>>>> to
>> >>>>>>> the group, while the old server would. This requires clients to
>> >>>>>>> differentiate whether their memberId has been added to the group
>> >>> or
>> >>>>> not,
>> >>>>>>> which could result in unexpected logs.
>> >>>>>>>
>> >>>>>>> Best Regards,
>> >>>>>>> TengYao
>> >>>>>>>
>> >>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月14日 週三
>> >>>>> 上午12:29寫道:
>> >>>>>>>
>> >>>>>>>> Hi TengYao,
>> >>>>>>>> Thanks for the KIP. I wonder if there’s a different way to close
>> >>>> what
>> >>>>>>>> is quite a small window.
>> >>>>>>>>
>> >>>>>>>> AS1: It is true that the initial heartbeat is not idempotent, but
>> >>>> this
>> >>>>>>>> remains
>> >>>>>>>> true with this KIP. It’s just differently not idempotent. If the
>> >>>>> client
>> >>>>>>>> makes its
>> >>>>>>>> own member ID, sends a request and dies, the GC will still have
>> >>>> added
>> >>>>>>>> the member to the group and it will hang around until the session
>> >>>>>> expires.
>> >>>>>>>>
>> >>>>>>>> I wonder if the GC could still generate the member ID in
>> >>> response to
>> >>>>> the
>> >>>>>>>> first
>> >>>>>>>> heartbeat, and put the member in a special PENDING state with no
>> >>>>>>>> assignments until the client sends the next heartbeat, thus
>> >>>> confirming
>> >>>>>> it
>> >>>>>>>> has received the member ID. This would not be a protocol change
>> >>> at
>> >>>>> all,
>> >>>>>>>> just
>> >>>>>>>> a change to the GC to keep a new member in the lobby until it’s
>> >>>>>> comfirmed
>> >>>>>>>> it knows its member ID.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Andrew
>> >>>>>>>>
>> >>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi <kiting...@gmail.com>
>> >>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Hi Chia-Ping,
>> >>>>>>>>>
>> >>>>>>>>> Thanks for review and suggestions.
>> >>>>>>>>> I have updated the content of KIP accordingly.
>> >>>>>>>>> Please take a look.
>> >>>>>>>>>
>> >>>>>>>>> Best regards,
>> >>>>>>>>> TengYao
>> >>>>>>>>>
>> >>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二 下午9:45寫道:
>> >>>>>>>>>
>> >>>>>>>>>> hi TengYao
>> >>>>>>>>>>
>> >>>>>>>>>> thanks for this KIP.
>> >>>>>>>>>>
>> >>>>>>>>>> 1) could you please describe the before/after behavior in the
>> >>>>>> "Proposed
>> >>>>>>>>>> Changes" section? IIRC, current RPC allows HB having member id
>> >>>>>>>> generated by
>> >>>>>>>>>> client, right? If HB has no member ID, server will generate one
>> >>>> and
>> >>>>>> then
>> >>>>>>>>>> return. The new behavior will enforce HB "must" have member ID.
>> >>>>>>>>>>
>> >>>>>>>>>> 2) could you please write the version number explicitly in the
>> >>> KIP
>> >>>>>>>>>>
>> >>>>>>>>>> 3) how new client code handle the old HB? Does it always
>> >>> generate
>> >>>>>> member
>> >>>>>>>>>> ID on client-side even though that is not restricted?
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>> Chia-Ping
>> >>>>>>>>>>
>> >>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote:
>> >>>>>>>>>>> Hello everyone,
>> >>>>>>>>>>>
>> >>>>>>>>>>> I would like to start a discussion thread on KIP-1082, which
>> >>>>> proposes
>> >>>>>>>>>>> enabling id generation for clients over the
>> >>>> ConsumerGroupHeartbeat
>> >>>>>> RPC.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Here is the KIP Link: KIP-1082
>> >>>>>>>>>>> <
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC
>> >>>>>>>>>>>
>> >>>>>>>>>>> Please take a look and let me know what you think, and I would
>> >>>>>>>> appreciate
>> >>>>>>>>>>> any suggestions and feedback.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best regards,
>> >>>>>>>>>>> TengYao
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>

Reply via email to