Re: [DISCUSS] KIP-1101: Trigger rebalance on rack topology changes

Andrew Schofield Mon, 16 Dec 2024 13:51:03 -0800

Hi PoAn,
Thanks for the KIP.

AS1: From the point of view of share groups, the API and record schema
definitions are unstable in AK 4.0. In AK 4.1, we will start supporting proper
versioning. As a result, I think you do not need to deprecate the fields in the
ShareGroupPartitionMetadataValue. Just include the schema for the fields
which are actually needed, and I'll update the schema in the code when
the KIP is implemented.


AS2: In the event that DJ04 actually removes the need for
ConsumerGroupPartitionMetadataKey/Value entirely, I would simply
delete ShareGroupPartitionMetadataKey/Value, assuming that it is
accepted in time for AK 4.1.


Thanks,
Andrew
________________________________________
From: Chia-Ping Tsai <chia7...@apache.org>
Sent: 16 December 2024 16:27
To: dev@kafka.apache.org <dev@kafka.apache.org>
Subject: Re: [DISCUSS] KIP-1101: Trigger rebalance on rack topology changes

hi David

> DJ05

One of the benefits of having a single hash per group (DJ04) is the reduction 
in the size of stored data. Additionally, the cost of re-computing can be 
minimized thanks to caching. So I'm + 1 to DJ04. However, the advantage of 
storing the topic cache in the metadata image is somewhat unclear to me. Could 
you please provide more details on what you mean by "tight"?Furthermore, since 
the metadata image is a thread-safe object, we need to ensure that the lazy 
initialization is also thread-safe. If no other components require the cache, 
it would be better to keep the caches within the coordinator.

Best,
Chia-Ping

On 2024/12/16 14:01:35 David Jacot wrote:
> Hi PoAn,
>
> Thanks for the KIP. I have some comments about it.
>
> DJ01: Please, remove all the code from the KIP. We only care about public
> interface changes, not about implementation details.
> DJ02: Regarding the hash computation, I agree that we should use Murmur3.
> However, I don't quite like the implementation that you shared. I wonder if
> we could make it work incrementally instead of computing a hash of
> everything and combining them.
> DJ03: Regarding the cache, my understanding is that the cache is populated
> when a topic without hash is seen in a HB request and the cache is cleaned
> up when topics are deleted based on the metadata image. However, the update
> path is not clear. Let's say that a partition is added to a topic, how does
> it detect it? Let's also imagine that the racks of a partition have
> changed, how does it detect it? In the KIP, it would be nice to be clear
> about those.
> DJ04: I wonder whether we should go with a single hash per group. Your
> argument against it is that it would require to re-compute the hash of all
> the topics when it needs to be computed. In my opinion, we could leverage
> the cached hash per topic to compute the hash of all the subscribed ones.
> We could basically combine all the hashes without having to compute all of
> them. This approach has a few benefits. 1) We could get rid of
> the ConsumerGroupPartitionMetadata record as we could store the hash with
> the group epoch. 2) We could get rid of the Map that we keep in each group
> to store the hashed corresponding to the subscribed topics.
> DJ05: Regarding the cache again, I wonder if we should actually store the
> hash in the metadata image instead of maintaining it somewhere else. We
> could still lazily compute it. The benefit is that the value would be tight
> to the topic. I have not really looked into it. Would it be an option?
>
> I'll be away for two weeks starting from Saturday. I kindly ask you to wait
> on me if we cannot conclude this week.
>
> Best,
> David
>
> On Tue, Nov 5, 2024 at 1:43 PM Frank Yang <yangp...@gmail.com> wrote:
>
> > Hi Chia-Ping,
> >
> > Thanks for the review and suggestions.
> >
> > Q0: Add how rack change and how it affects topic partition.
> >
> > Q1: Add why we need a balance algorithm to Motivation section.
> >
> > Q2: After checking again, we don’t need to update cache when we replay
> > records. We only need to renew it in consumer heartbeat.
> >
> > Q3: Add a new section “Topic Hash Function”.
> >
> > Thanks.
> > PoAn
> >
> > > On Nov 1, 2024, at 4:39 PM, Chia-Ping Tsai <chia7...@gmail.com> wrote:
> > >
> > > hi PoAn
> > >
> > > Thanks for for this KIP!
> > >
> > > Q0: Could you add more details about `A topic partition has rack change`?
> > > IIRC, the "rack change" includes both follower and leader, right?
> > >
> > > Q1: Could you please add the 'concerns' we discussed to the Motivation
> > > section? This should include topics like 'computations' and 'space
> > usage'.
> > >
> > > Q2: `The group coordinator can leverage it to add a new topic hash.`This
> > > description seems a bit off to me. Why do we need to update the cache at
> > > this phase? The cache is intended to prevent duplicate computations
> > caused
> > > by heartbeat requests that occur between two metadata change events.
> > > Therefore, we could even remove the changed topics from caches on a
> > > metadata change, as the first heartbeat request would update the caches
> > for
> > > all changed topics.
> > >
> > > Q3: Could you please include a section about the choice of hash
> > > implementation? The hash implementation must be consistent across
> > different
> > > JDKs, so we use Murmur3 to generate the hash value.
> > >
> > > Best,
> > > Chia-Ping
> > >
> > >
> > > Frank Yang <yangp...@gmail.com> 於 2024年11月1日 週五 下午3:57寫道：
> > >
> > >> Hi all,
> > >>
> > >> I would like to start a discussion thread on KIP-1101. Trigger rebalance
> > >> on rack topology changes. In this KIP, we aim to use less memory / disk
> > >> resources to detect rack changes in the new coordinator.
> > >>
> > >>
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1101%3A+Trigger+rebalance+on+rack+topology+changes
> > >>
> > >> Please take a look and feel free to share any thoughts.
> > >>
> > >> Thanks.
> > >> PoAn
> >
> >
>

Re: [DISCUSS] KIP-1101: Trigger rebalance on rack topology changes

Reply via email to