Hi TengYao,
I’ve been reading through the comments and I’m happy that the lobby
approach has not gained support.

Assuming that this KIP is voted, I will be happy to change KIP-932 so
that it only supports client-side member ID generation. Because that KIP is 
still
under development, I can do this in the first version of ShareGroupHeartbeat.

AS2: For the encoding section, I suppose the specific encoding which
is used is what org.apache.kafka.utils.Uuid uses.

Thanks,
Andrew

> On 14 Aug 2024, at 17:00, TengYao Chi <kiting...@gmail.com> wrote:
>
> Hello Apoorv,
> Thank you for your feedback.
> Regarding the questions you raised, unfortunately, this KIP cannot
> guarantee the order of heartbeats. As with many classic distributed system
> challenges, what we can do is make our best effort to ensure that there are
> no idle members or stale assignments under normal circumstances.
>
> As for the lobby approach, I’m not a fan of it because it requires adding a
> mechanism to maintain client state within the ConsumerGroup, which, in my
> view, resembles something like a two-phase commit. This would introduce
> more complexity than the proposal in this KIP, which is something we want
> to avoid. KIP-848 aims to simplify the existing protocol, and while the
> lobby approach is a good one, I believe it is not the right fit for this
> particular situation.
>
> Best regards,
> TengYao
>
> TengYao Chi <kiting...@gmail.com> 於 2024年8月14日 週三 下午11:45寫道:
>
>> Hi David,
>>
>> I really appreciate your review and suggestions. As I am still gaining
>> experience in writing KIPs, your input has been incredibly helpful. I am
>> currently applying your suggestions to the KIP and will complete it as soon
>> as possible.
>> Regarding the UUID part, I think we haven’t reached a conclusion yet.(So
>> far according to this thread)
>> However, I will review the current implementation in the Kafka `Uuid`
>> class and include a brief specification in the KIP.
>>
>> Once again, thank you so much for your help.
>>
>> Best regards,
>> TengYao
>>
>> Chia-Ping Tsai <chia7...@gmail.com> 於 2024年8月14日 週三 下午11:14寫道:
>>
>>> hi Apoorv
>>>
>>>> As the memberId is now known to the client, and client might send the
>>> leave
>>> group heartbeat on shutdown prior to receiving the initial heartbeat
>>> response. If that's true then how do we guarantee that the 2 requests to
>>> join and leave will be processed in order, which could still leave stale
>>> members or throw unknown member id exceptions?
>>>
>>> This is definitely a good question. the short answer: no guarantee but
>>> best
>>> efforts
>>>
>>> Please notice the root cause is "we have no enough time to wait member id
>>> (response) when closing consumer". Sadly, we can' guarantee the request
>>> order due to the same reason.
>>>
>>> However, in contrast to previous behavior, there is one big benefit of new
>>> approach - we can try STONITH because we know the member id
>>>
>>> Best,
>>> Chia-Ping
>>>
>>>
>>> Apoorv Mittal <apoorvmitta...@gmail.com> 於 2024年8月14日 週三 下午8:55寫道:
>>>
>>>> Hi TengYao,
>>>> Thanks for the KIP. Continuing on the point which Andrew mentioned as
>>> AS1.
>>>>
>>>> As the memberId is now known to the client, and client might send the
>>> leave
>>>> group heartbeat on shutdown prior to receiving the initial heartbeat
>>>> response. If that's true then how do we guarantee that the 2 requests to
>>>> join and leave will be processed in order, which could still leave stale
>>>> members or throw unknown member id exceptions?
>>>>
>>>> Though the client side member id generation is helpful which will
>>> represent
>>>> the same group perspective as from client and broker's end. But I think
>>> the
>>>> major concern we want to solve here is Stale Partition Assignments which
>>>> might still exist with the new approach. I am leaning towards the
>>>> suggestion mentioned by Andrew where partition assignment triggers on
>>>> subsequent heartbeat when client acknowledges the initial heartbeat,
>>>> delayed partition assignment.
>>>>
>>>> Though on a separate note, I have a different question. What happens
>>> when
>>>> there is an issue with the client which sends the initial heartbeat
>>> without
>>>> memberId, then crashes and restarts? I think we must be re-triggering
>>>> assignments and expiring members only after the heartbeat session
>>> timeout?
>>>> If that's true then shall delayed partition assignment can help benefit
>>> us
>>>> from this situation as well?
>>>>
>>>> Regards,
>>>> Apoorv Mittal
>>>>
>>>>
>>>> On Wed, Aug 14, 2024 at 12:51 PM David Jacot
>>> <dja...@confluent.io.invalid>
>>>> wrote:
>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> Personally, I don't like the lobby approach. It makes things more
>>>>> complicated and it would require changing the records on the server
>>> too.
>>>>> This is why I initially suggested the rejected alternative #2 which is
>>>>> pretty close but also not perfect.
>>>>>
>>>>> I'd like to clarify one thing. The ConsumerGroupHeartbeat API already
>>>>> supports generating the member id on the client so we don't need any
>>>>> conditional logic on the client side. This is actually what we wanted
>>> to
>>>> do
>>>>> in the first place but the idea got pushed back by Magnus back then
>>>> because
>>>>> generating uuid from librdkafka required a new dependency. It turns
>>> out
>>>>> that librdkafka has that dependency today. In retrospect, we should
>>> have
>>>>> pushed back on this. Long story short, we can just do it. The
>>> proposal in
>>>>> this KIP is to make the member id required in future versions. We
>>> could
>>>>> also decide not to do it and to keep supporting both approaches. I
>>> would
>>>>> also be fine with this.
>>>>>
>>>>> Best,
>>>>> David
>>>>>
>>>>> On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield <
>>>>> andrew_schofi...@live.com>
>>>>> wrote:
>>>>>
>>>>>> Hi TengYao,
>>>>>> Thanks for your response. I’ll have just one more try to persuade.
>>>>>> I feel that I will need to follow the approach with KIP-932 when
>>> we’ve
>>>>>> made a decision, so I do have more than a passing interest in this.
>>>>>>
>>>>>> A group member in the lobby is in the group, but it does not have
>>> any
>>>>>> assignments. A member of a consumer group can have no assigned
>>>>>> partitions (such as 5 CG members subscribed to a topic with 4
>>>>> partitions),
>>>>>> so it’s a situation that consumer group members already expect.
>>>>>>
>>>>>> One of Kafka’s strengths is the way that we handle API versioning.
>>>>>> But, there is a cost - the behaviour is different depending on the
>>> RPC
>>>>>> version. KIP-848 is on the cusp of completion, but we’re already
>>> adding
>>>>>> conditional logic for v0/v1 for ConsumerGroupHeartbeat. That’s a
>>> pity.
>>>>>> Only a minor issue, but it’s unfortunate.
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew
>>>>>>
>>>>>>> On 14 Aug 2024, at 08:47, TengYao Chi <kiting...@gmail.com>
>>> wrote:
>>>>>>>
>>>>>>> Hello Andrew
>>>>>>> Thank you for your thoughtful suggestions and getting the
>>> discussion
>>>>>> going.
>>>>>>>
>>>>>>> To AS1:
>>>>>>> In the current scenario where the server generates the UUID, if
>>> the
>>>>>> client
>>>>>>> shuts down before receiving the memberId generated by the GC
>>>>> (regardless
>>>>>> of
>>>>>>> whether it’s a graceful shutdown or not), the GC will still have
>>> to
>>>>> wait
>>>>>>> for the heartbeat timeout because the client doesn’t know its
>>>> memberId.
>>>>>>> This KIP indeed cannot completely resolve the idempotency issue,
>>> but
>>>> it
>>>>>> can
>>>>>>> better handle shutdown scenarios under normal circumstances
>>> because
>>>> the
>>>>>>> client always knows its memberId. Even if the client shuts down
>>>>>> immediately
>>>>>>> after the initial heartbeat, as long as it performs a graceful
>>>> shutdown
>>>>>> and
>>>>>>> sends a leave heartbeat, the GC can manage the situation and
>>> remove
>>>> the
>>>>>>> member. Therefore, the goal of this KIP is to address the issue
>>> where
>>>>> the
>>>>>>> GC has to wait for the heartbeat timeout due to the client leaving
>>>>>> without
>>>>>>> knowing its memberId, which leads to reduced throughput and
>>> limited
>>>>>>> scalability.
>>>>>>>
>>>>>>> The solution you suggest has also been proposed by David. The
>>> concern
>>>>>> with
>>>>>>> this approach is that it introduces additional complexity for
>>>>>>> compatibility, as the new server would not immediately add the
>>> member
>>>>> to
>>>>>>> the group, while the old server would. This requires clients to
>>>>>>> differentiate whether their memberId has been added to the group
>>> or
>>>>> not,
>>>>>>> which could result in unexpected logs.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> TengYao
>>>>>>>
>>>>>>> Andrew Schofield <andrew_schofi...@live.com> 於 2024年8月14日 週三
>>>>> 上午12:29寫道:
>>>>>>>
>>>>>>>> Hi TengYao,
>>>>>>>> Thanks for the KIP. I wonder if there’s a different way to close
>>>> what
>>>>>>>> is quite a small window.
>>>>>>>>
>>>>>>>> AS1: It is true that the initial heartbeat is not idempotent, but
>>>> this
>>>>>>>> remains
>>>>>>>> true with this KIP. It’s just differently not idempotent. If the
>>>>> client
>>>>>>>> makes its
>>>>>>>> own member ID, sends a request and dies, the GC will still have
>>>> added
>>>>>>>> the member to the group and it will hang around until the session
>>>>>> expires.
>>>>>>>>
>>>>>>>> I wonder if the GC could still generate the member ID in
>>> response to
>>>>> the
>>>>>>>> first
>>>>>>>> heartbeat, and put the member in a special PENDING state with no
>>>>>>>> assignments until the client sends the next heartbeat, thus
>>>> confirming
>>>>>> it
>>>>>>>> has received the member ID. This would not be a protocol change
>>> at
>>>>> all,
>>>>>>>> just
>>>>>>>> a change to the GC to keep a new member in the lobby until it’s
>>>>>> comfirmed
>>>>>>>> it knows its member ID.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>>> On 13 Aug 2024, at 15:59, TengYao Chi <kiting...@gmail.com>
>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Chia-Ping,
>>>>>>>>>
>>>>>>>>> Thanks for review and suggestions.
>>>>>>>>> I have updated the content of KIP accordingly.
>>>>>>>>> Please take a look.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> TengYao
>>>>>>>>>
>>>>>>>>> Chia-Ping Tsai <chia7...@apache.org> 於 2024年8月13日 週二 下午9:45寫道:
>>>>>>>>>
>>>>>>>>>> hi TengYao
>>>>>>>>>>
>>>>>>>>>> thanks for this KIP.
>>>>>>>>>>
>>>>>>>>>> 1) could you please describe the before/after behavior in the
>>>>>> "Proposed
>>>>>>>>>> Changes" section? IIRC, current RPC allows HB having member id
>>>>>>>> generated by
>>>>>>>>>> client, right? If HB has no member ID, server will generate one
>>>> and
>>>>>> then
>>>>>>>>>> return. The new behavior will enforce HB "must" have member ID.
>>>>>>>>>>
>>>>>>>>>> 2) could you please write the version number explicitly in the
>>> KIP
>>>>>>>>>>
>>>>>>>>>> 3) how new client code handle the old HB? Does it always
>>> generate
>>>>>> member
>>>>>>>>>> ID on client-side even though that is not restricted?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Chia-Ping
>>>>>>>>>>
>>>>>>>>>> On 2024/08/13 06:20:42 TengYao Chi wrote:
>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>
>>>>>>>>>>> I would like to start a discussion thread on KIP-1082, which
>>>>> proposes
>>>>>>>>>>> enabling id generation for clients over the
>>>> ConsumerGroupHeartbeat
>>>>>> RPC.
>>>>>>>>>>>
>>>>>>>>>>> Here is the KIP Link: KIP-1082
>>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Enable+ID+Generation+for+Clients+over+the+ConsumerGroupHeartbeat+RPC
>>>>>>>>>>>
>>>>>>>>>>> Please take a look and let me know what you think, and I would
>>>>>>>> appreciate
>>>>>>>>>>> any suggestions and feedback.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> TengYao
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to