Re: [DISCUSS] KIP-1101: Trigger rebalance on rack topology changes

PoAn Yang Tue, 17 Dec 2024 00:26:24 -0800

Hi Chia-Ping / David / Andrew,

Thanks for the review and suggestions.


DJ01: Removed all implementation details.

DJ02: Does the “incrementally” mean that we only calculate the difference parts?
For example, if the number of partition change, we only calculate the hash of 
number of partition and reconstruct it to the topic hash.
IMO, we only calculate topic hash one time. With cache mechanism, the value can 
be reused in different groups on a same broker.
The CPU usage for this part is not very high.

DJ03: Added the update path to KIP for both cases.

DJ04: Yes, it’s a good idea. With cache mechanism and single hash per group, we 
can balance cpu and disk usage.

DJ05: Currently, the topic hash is only used in coordinator. However, the 
metadata image is used in many different places.
How about we move the hash to metadata image when we find more use cases?

AS1, AS2: Thanks for the reminder. I will simply delete 
ShareGroupPartitionMetadataKey/Value and add a new field to 
ShareGroupMetadataValue.

Best,
PoAn

> On Dec 17, 2024, at 5:50 AM, Andrew Schofield 
> <andrew_schofield_j...@outlook.com> wrote:
> 
> Hi PoAn,
> Thanks for the KIP.
> 
> AS1: From the point of view of share groups, the API and record schema
> definitions are unstable in AK 4.0. In AK 4.1, we will start supporting proper
> versioning. As a result, I think you do not need to deprecate the fields in 
> the
> ShareGroupPartitionMetadataValue. Just include the schema for the fields
> which are actually needed, and I'll update the schema in the code when
> the KIP is implemented.
> 
> AS2: In the event that DJ04 actually removes the need for
> ConsumerGroupPartitionMetadataKey/Value entirely, I would simply
> delete ShareGroupPartitionMetadataKey/Value, assuming that it is
> accepted in time for AK 4.1.
> 
> 
> Thanks,
> Andrew
> ________________________________________
> From: Chia-Ping Tsai <chia7...@apache.org>
> Sent: 16 December 2024 16:27
> To: dev@kafka.apache.org <dev@kafka.apache.org>
> Subject: Re: [DISCUSS] KIP-1101: Trigger rebalance on rack topology changes
> 
> hi David
> 
>> DJ05
> 
> One of the benefits of having a single hash per group (DJ04) is the reduction 
> in the size of stored data. Additionally, the cost of re-computing can be 
> minimized thanks to caching. So I'm + 1 to DJ04. However, the advantage of 
> storing the topic cache in the metadata image is somewhat unclear to me. 
> Could you please provide more details on what you mean by 
> "tight"?Furthermore, since the metadata image is a thread-safe object, we 
> need to ensure that the lazy initialization is also thread-safe. If no other 
> components require the cache, it would be better to keep the caches within 
> the coordinator.
> 
> Best,
> Chia-Ping
> 
> On 2024/12/16 14:01:35 David Jacot wrote:
>> Hi PoAn,
>> 
>> Thanks for the KIP. I have some comments about it.
>> 
>> DJ01: Please, remove all the code from the KIP. We only care about public
>> interface changes, not about implementation details.
>> DJ02: Regarding the hash computation, I agree that we should use Murmur3.
>> However, I don't quite like the implementation that you shared. I wonder if
>> we could make it work incrementally instead of computing a hash of
>> everything and combining them.
>> DJ03: Regarding the cache, my understanding is that the cache is populated
>> when a topic without hash is seen in a HB request and the cache is cleaned
>> up when topics are deleted based on the metadata image. However, the update
>> path is not clear. Let's say that a partition is added to a topic, how does
>> it detect it? Let's also imagine that the racks of a partition have
>> changed, how does it detect it? In the KIP, it would be nice to be clear
>> about those.
>> DJ04: I wonder whether we should go with a single hash per group. Your
>> argument against it is that it would require to re-compute the hash of all
>> the topics when it needs to be computed. In my opinion, we could leverage
>> the cached hash per topic to compute the hash of all the subscribed ones.
>> We could basically combine all the hashes without having to compute all of
>> them. This approach has a few benefits. 1) We could get rid of
>> the ConsumerGroupPartitionMetadata record as we could store the hash with
>> the group epoch. 2) We could get rid of the Map that we keep in each group
>> to store the hashed corresponding to the subscribed topics.
>> DJ05: Regarding the cache again, I wonder if we should actually store the
>> hash in the metadata image instead of maintaining it somewhere else. We
>> could still lazily compute it. The benefit is that the value would be tight
>> to the topic. I have not really looked into it. Would it be an option?
>> 
>> I'll be away for two weeks starting from Saturday. I kindly ask you to wait
>> on me if we cannot conclude this week.
>> 
>> Best,
>> David
>> 
>> On Tue, Nov 5, 2024 at 1:43 PM Frank Yang <yangp...@gmail.com> wrote:
>> 
>>> Hi Chia-Ping,
>>> 
>>> Thanks for the review and suggestions.
>>> 
>>> Q0: Add how rack change and how it affects topic partition.
>>> 
>>> Q1: Add why we need a balance algorithm to Motivation section.
>>> 
>>> Q2: After checking again, we don’t need to update cache when we replay
>>> records. We only need to renew it in consumer heartbeat.
>>> 
>>> Q3: Add a new section “Topic Hash Function”.
>>> 
>>> Thanks.
>>> PoAn
>>> 
>>>> On Nov 1, 2024, at 4:39 PM, Chia-Ping Tsai <chia7...@gmail.com> wrote:
>>>> 
>>>> hi PoAn
>>>> 
>>>> Thanks for for this KIP!
>>>> 
>>>> Q0: Could you add more details about `A topic partition has rack change`?
>>>> IIRC, the "rack change" includes both follower and leader, right?
>>>> 
>>>> Q1: Could you please add the 'concerns' we discussed to the Motivation
>>>> section? This should include topics like 'computations' and 'space
>>> usage'.
>>>> 
>>>> Q2: `The group coordinator can leverage it to add a new topic hash.`This
>>>> description seems a bit off to me. Why do we need to update the cache at
>>>> this phase? The cache is intended to prevent duplicate computations
>>> caused
>>>> by heartbeat requests that occur between two metadata change events.
>>>> Therefore, we could even remove the changed topics from caches on a
>>>> metadata change, as the first heartbeat request would update the caches
>>> for
>>>> all changed topics.
>>>> 
>>>> Q3: Could you please include a section about the choice of hash
>>>> implementation? The hash implementation must be consistent across
>>> different
>>>> JDKs, so we use Murmur3 to generate the hash value.
>>>> 
>>>> Best,
>>>> Chia-Ping
>>>> 
>>>> 
>>>> Frank Yang <yangp...@gmail.com> 於 2024年11月1日 週五 下午3:57寫道：
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I would like to start a discussion thread on KIP-1101. Trigger rebalance
>>>>> on rack topology changes. In this KIP, we aim to use less memory / disk
>>>>> resources to detect rack changes in the new coordinator.
>>>>> 
>>>>> 
>>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1101%3A+Trigger+rebalance+on+rack+topology+changes
>>>>> 
>>>>> Please take a look and feel free to share any thoughts.
>>>>> 
>>>>> Thanks.
>>>>> PoAn
>>> 
>>> 
>>

Re: [DISCUSS] KIP-1101: Trigger rebalance on rack topology changes

Reply via email to