Thanks, PoAn! On Thu, Apr 17, 2025 at 8:59 AM PoAn Yang <yangp...@gmail.com> wrote:
> Hi David, > > Thanks for the review. Appreciated it. > > DJ05: Let's discuss more in the pull request. > > DJ07: Add section “Bump Group Version” to “Rejected Alternatives”. > > Thanks, > PoAn > > > On Apr 16, 2025, at 4:34 PM, David Jacot <dja...@confluent.io.INVALID> > wrote: > > > > Hi PoAn, > > > > Thanks for your very detailed answers! > > > > DJ05: This seems reasonable. We can look into it more deeply during the > > implementation too. > > > > DJ07: Yeah, I see your point. I don't have a strict opinion on this so I > am > > happy to go with the majority. Could you please at least mention in the > > rejected alternative section? It would be good to have a trace of it. > > > > Best, > > David > > > > On Wed, Apr 16, 2025 at 4:23 AM PoAn Yang <yangp...@gmail.com <mailto: > yangp...@gmail.com>> wrote: > > > >> Hi Lucas, > >> > >> Thank you. Here is vote thread > >> https://lists.apache.org/thread/j90htmphjk783p697jjfg1xt8thmy33p. > >> > >> Regards, > >> PoAn > >> > >>> On Apr 16, 2025, at 12:07 AM, Lucas Brutschy <lbruts...@confluent.io > .INVALID> > >> wrote: > >>> > >>> Hi PoAn, > >>> > >>> from my side, we are good to move to a vote on this one. > >>> > >>> Cheers, > >>> Lucas > >>> > >>> On Thu, Apr 10, 2025 at 11:29 AM PoAn Yang <yangp...@gmail.com > <mailto:yangp...@gmail.com><mailto: > >> yangp...@gmail.com <mailto:yangp...@gmail.com>>> wrote: > >>>> > >>>> Hi David and Lucas, > >>>> > >>>> Thanks for the review and great suggestion. > >>>> > >>>> DJ01: Add the statement about rack-aware rebalance was removed in 4.0. > >>>> > >>>> DJ02: Change ConsumerGroupPartitionMetadataKey and > >>>> ConsumerGroupPartitionMetadataValue from removed to deprecated. > >>>> > >>>> DJ03: In my sample implementation < > >> https://github.com/apache/kafka/pull/17444>, the cache is maintained in > >> GroupCoordinatorManager. > >>>> A GroupCoordinatorShard only have one GroupCoordinatorManager, so I > >> think it’s similar to > >>>> a cache per GroupCoordinatorShard instance. > >>>> > >>>> DJ04: Add the order about how topic data will be computed to get a > >> topic hash. > >>>> > >>>> DJ05: This is a good question. In topic hash, the Guava uses “final > >> mix” to ensure avalanche > >>>> effect. However for the single hash of subscribed topics in a group, > we > >> cannot use simple > >>>> sum, because we need to avoid the edge case. For example, hash(a+b) > and > >> hash(b+a) is > >>>> same. To fix this case, the hash function sorts topics by name and the > >> topic hash multiplied > >>>> by the position value like hash(a + 2b) and hash(b + 2a). I add this > >> case and regular expression case to the KIP. > >>>> > >>>> DJ06: It’s good to add tombstone of ConsumerGroupPartitionMetadata, so > >> log compaction > >>>> can remove old data. Add this part to compatibility section. > >>>> > >>>> DJ07: In previous discussion, you mentioned that assignors are sticky. > >> If the topology of the > >>>> group is the same, the upgrade/downgrade rebalance doesn’t change > >> anything. > >>>> > >>>>>>>>>>>>>> DJ08: In my opinion, changing the format will be rare so it > is > >>>>>>>>>>>>>> acceptable if rebalances are triggered in this case on > >>>>>>>>>>>>>> upgrade/downgrade. It is also what will happen if a cluster > is > >>>>>>>>>>>>>> downgraded from 4.1 (with this KIP) to 4.0. The rebalance > >> won't > >>>>>>>> change > >>>>>>>>>>>>>> anything if the topology of the group is the same because > >>>>>> assignors > >>>>>>>>>>>>>> are sticky. The default ones are and we recommend custom > ones > >> to > >>>>>>>> also > >>>>>>>>>>>>>> be. > >>>> > >>>> With `group.version` change, we need to maintain two code in 4.1. One > >> is subscription > >>>> metadata without rack and another is group hash. Another drawback is > >> that changing > >>>> `group.version` will trigger all group rebalance at the same time. > >> IMHO, I would prefer > >>>> to keep the code simple. > >>>> > >>>> DJ08: Add ConsumerGroupHeartbeatRequestTest to test plan. > >>>> > >>>> LB01: Add the removal of StreamGroupPartitionMetadataKey / > >>>> StreamGrouprtitionMetadataValue to public interface change section. > >>>> > >>>> Thank you. > >>>> > >>>> Regards, > >>>> PoAn > >>>> > >>>>> On Apr 9, 2025, at 5:11 PM, Lucas Brutschy > >>>>> <lbruts...@confluent.io.INVALID > <mailto:lbruts...@confluent.io.INVALID>> > >> wrote: > >>>>> > >>>>> Hi PoAn, > >>>>> > >>>>> LB01: please add the removal of StreamsGroupPartitionMetadataKey / > >>>>> StreamsGroupPartitionMetadataValue to the KIP. Since these were not > >>>>> yet included in 4.0, we can just remove them as long as this KIP hits > >>>>> 4.1. I can help with the integration into the > `streamsGroupHeartbeat`, > >>>>> as it will be a little different, but we do not need to call this out > >>>>> in the KIP. > >>>>> > >>>>> Thanks for your work on this! > >>>>> > >>>>> Cheers, > >>>>> Lucas > >>>>> > >>>>> On Wed, Apr 9, 2025 at 9:59 AM David Jacot > <dja...@confluent.io.invalid <mailto:dja...@confluent.io.invalid>> > >> wrote: > >>>>>> > >>>>>> Hi PoAn, > >>>>>> > >>>>>> I finally had a bit of time to review your KIP. First, let me say > >> that the > >>>>>> KIP is well written! I have some more comments: > >>>>>> > >>>>>> DJ01: In the motivation, could you please mention that we actually > >> removed > >>>>>> triggering rebalances based on rack changes in 4.0? You explained > >> very well > >>>>>> why we removed it but it is not clear that we removed it. > >>>>>> > >>>>>> DJ02: We cannot delete ConsumerGroupPartitionMetadataKey > >>>>>> and ConsumerGroupPartitionMetadataValue because the coordinator must > >> still > >>>>>> be able to read the record from the log when the cluster is > upgraded. > >>>>>> However, we can deprecate them. > >>>>>> > >>>>>> DJ03: Could you clarify how the cache will be scoped? My > >> understanding is > >>>>>> that we will maintain a cache per GroupCoordinatorShard instance. Is > >> this > >>>>>> correct? > >>>>>> > >>>>>> DJ04: Regarding the topic hash function, would it be possible to > >> specify > >>>>>> how the hash will be computed instead of providing the signature of > >> the > >>>>>> method? > >>>>>> > >>>>>> DJ05: I have a general question about how you propose to combine > >> hashes. Is > >>>>>> using a sum enough? I was looking at the Guava implementation and > >> they also > >>>>>> apply a "final mix" to ensure an avalanche effect. It would be great > >> to > >>>>>> elaborate a bit more on this in the KIP. > >>>>>> > >>>>>> DJ06: After the upgrade, we need to ensure that a tombstone is > >> written for > >>>>>> the ConsumerGroupPartitionMetadata record if it was present. I > >> suppose that > >>>>>> this is something that we could do when the next metadata hash is > >> computed. > >>>>>> Have you thought about it? > >>>>>> > >>>>>> DJ07: I wonder whether we should gate the change with > >> `group.version`. My > >>>>>> concern is that during upgrade, not all the coordinators will > support > >> the > >>>>>> new way and groups may bounce between those coordinators as their > >>>>>> underlying __consumer_offsets partitions are moved within the > cluster > >>>>>> (during the roll). This may trigger unnecessary rebalances. One way > to > >>>>>> avoid this is to gate the change with `group.version` which is only > >> bumped > >>>>>> at the end of the cluster upgrade process. What do you think? > >>>>>> > >>>>>> DJ08: Regarding the test plan, we should also extend > >>>>>> ConsumerGroupHeartbeatRequestTest to cover the new feature. > >>>>>> > >>>>>> Thanks for your hard work on this one. Let's try to get it in 4.1. > >>>>>> > >>>>>> Best, > >>>>>> David > >>>>>> > >>>>>> On Mon, Mar 24, 2025 at 8:17 PM Jeff Kim > >> <jeff....@confluent.io.invalid <mailto:jeff....@confluent.io.invalid>> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi PoAn, > >>>>>>> > >>>>>>> Thanks for the clarification! > >>>>>>> > >>>>>>> Best, > >>>>>>> Jeff > >>>>>>> > >>>>>>> On Mon, Mar 24, 2025 at 11:43 AM PoAn Yang <yangp...@gmail.com > <mailto:yangp...@gmail.com>> > >> wrote: > >>>>>>> > >>>>>>>> Hi Jeff, > >>>>>>>> > >>>>>>>> Thanks for the review. > >>>>>>>> > >>>>>>>> JK01. Yes, we already maintain `groupByTopics` data structure in > the > >>>>>>>> coordinator. If a group leaves, > >>>>>>>> it checks whether a topic is subscribed by any group. If it’s not, > >> remove > >>>>>>>> topic hash from the cache. > >>>>>>>> Please check the sample implementation about > >>>>>>>> GroupMetadataManager#unsubscribeGroupFromTopic > >>>>>>>> in #17444 <https://github.com/apache/kafka/pull/17444>. > >>>>>>>> > >>>>>>>> JK02: This KIP replaces subscription metadata by a group hash, so > >> the > >>>>>>>> coordinator calculates group > >>>>>>>> hash where we calculated subscription metadata. The trigger points > >> are > >>>>>>>> like a group member is updated, > >>>>>>>> , new group member, new subscribed topic names, new regular > >> expressions, > >>>>>>>> and new metadata (topic change). > >>>>>>>> > >>>>>>>> Best Regards, > >>>>>>>> PoAn > >>>>>>>> > >>>>>>>>> On Mar 22, 2025, at 3:25 AM, Jeff Kim > >> <jeff....@confluent.io.INVALID <mailto:jeff....@confluent.io.INVALID>> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hi PoAn, > >>>>>>>>> > >>>>>>>>> I was late to the discussion but I had some questions. > >>>>>>>>> > >>>>>>>>> JK01. I foresee scenarios where the cache leaves stale topic > hashes > >>>>>>>> around > >>>>>>>>> when a group leaves. > >>>>>>>>> Should we maintain a counter for the number of groups subscribed > >> to the > >>>>>>>>> topic (hash) to only keep active topics? > >>>>>>>>> > >>>>>>>>> JK02. Can you help me understand when we would actually be > >> computing > >>>>>>>>> group-level hashes, is it on every > >>>>>>>>> heartbeat? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Jeff > >>>>>>>>> > >>>>>>>>> On Wed, Mar 12, 2025 at 11:33 PM PoAn Yang <yangp...@gmail.com > <mailto:yangp...@gmail.com>> > >> wrote: > >>>>>>>>> > >>>>>>>>>> Hi David, > >>>>>>>>>> > >>>>>>>>>> Yes, I will continue working on this. > >>>>>>>>>> > >>>>>>>>>> DJ02: As Kirk noted earlier, since the topic hash calculation is > >> on > >>>>>>>> server > >>>>>>>>>> side (not client-side), we can safely introduce Guava as a > >> dependency. > >>>>>>>>>> > >>>>>>>>>> If there is no other discussion, we can start voting in > >>>>>>>>>> > https://lists.apache.org/thread/j90htmphjk783p697jjfg1xt8thmy33p. > >> We > >>>>>>>> can > >>>>>>>>>> discuss implementation details in PR > >>>>>>>>>> https://github.com/apache/kafka/pull/17444. I will resolve > merge > >>>>>>>>>> conflicts by March 14th. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> PoAn > >>>>>>>>>> > >>>>>>>>>>> On Mar 13, 2025, at 12:43 AM, David Jacot < > david.ja...@gmail.com > >>> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi PoAn, > >>>>>>>>>>> > >>>>>>>>>>> As we are getting closer to releasing 4.0, I think that we can > >> resume > >>>>>>>>>>> working on this one if you are still interested. Are you? I > would > >>>>>>> like > >>>>>>>>>>> to have it in 4.1. > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> David > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Jan 15, 2025 at 1:26 AM Kirk True <k...@kirktrue.pro> > >> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> Hopefully a quick question... > >>>>>>>>>>>> > >>>>>>>>>>>> KT01. Will clients calculate the topic hash on the client? > >> Based on > >>>>>>>> the > >>>>>>>>>> current state of the KIP and PR, I would have thought "no", but > I > >> ask > >>>>>>>> based > >>>>>>>>>> on the discussion around the possible use of Guava on client. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Kirk > >>>>>>>>>>>> > >>>>>>>>>>>> On Mon, Jan 6, 2025, at 9:11 AM, David Jacot wrote: > >>>>>>>>>>>>> Hi PoAn, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks for the update. I haven't read the updated KIP yet. > >>>>>>>>>>>>> > >>>>>>>>>>>>> DJ02: I am not sure about using Guava as a dependency. I > >> mentioned > >>>>>>> it > >>>>>>>>>> more > >>>>>>>>>>>>> as an inspiration/reference. I suppose that we could use it > on > >> the > >>>>>>>>>> server > >>>>>>>>>>>>> but we should definitely not use it on the client. I am not > >> sure > >>>>>>> how > >>>>>>>>>> others > >>>>>>>>>>>>> feel about it. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> David > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Mon, Jan 6, 2025 at 5:21 AM PoAn Yang <yangp...@gmail.com > > > >>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Chia-Ping / David / Lucas, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Happy new year and thanks for the review. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> DJ02: Thanks for the suggestion. I updated the PR to use > >> Guava. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> DJ03: Yes, I updated the description to mention ISR change, > >>>>>>>>>>>>>> add altering partition reassignment case, and mention that > >>>>>>>>>>>>>> non-related topic change doesn’t trigger a rebalance. > >>>>>>>>>>>>>> DJ03.1: Yes, I will keep using > >> ModernGroup#requestMetadataRefresh > >>>>>>>>>>>>>> to notify group. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> DJ06: Updated the PR to use Guava Hashing#combineUnordered > >>>>>>>>>>>>>> function to combine topic hash. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> DJ07: Renamed it to MetadataHash. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> DJ08: Added a sample hash function to the KIP and use first > >> byte > >>>>>>> as > >>>>>>>>>> magic > >>>>>>>>>>>>>> byte. This is also included in latest PR. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> DJ09: Added two paragraphs about upgraded and downgraded. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> DJ10: According to Lucas’s comment, I add > >>>>>>> StreamsGroupMetadataValue > >>>>>>>>>> update > >>>>>>>>>>>>>> to this KIP. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> PoAn > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Dec 20, 2024, at 3:58 PM, Chia-Ping Tsai < > >> chia7...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> because assignors are sticky. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I forgot about that spec again :( > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> David Jacot <david.ja...@gmail.com> 於 2024年12月20日 週五 > >> 下午3:41寫道: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Chia-Ping, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> DJ08: In my opinion, changing the format will be rare so > it > >> is > >>>>>>>>>>>>>>>> acceptable if rebalances are triggered in this case on > >>>>>>>>>>>>>>>> upgrade/downgrade. It is also what will happen if a > cluster > >> is > >>>>>>>>>>>>>>>> downgraded from 4.1 (with this KIP) to 4.0. The rebalance > >> won't > >>>>>>>>>> change > >>>>>>>>>>>>>>>> anything if the topology of the group is the same because > >>>>>>>> assignors > >>>>>>>>>>>>>>>> are sticky. The default ones are and we recommend custom > >> ones to > >>>>>>>>>> also > >>>>>>>>>>>>>>>> be. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> David > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Fri, Dec 20, 2024 at 2:11 AM Chia-Ping Tsai < > >>>>>>>> chia7...@apache.org > >>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> ummm, it does not work for downgrade as the old > >> coordinator has > >>>>>>>> no > >>>>>>>>>> idea > >>>>>>>>>>>>>>>> about new format :( > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On 2024/12/20 00:57:27 Chia-Ping Tsai wrote: > >>>>>>>>>>>>>>>>>> hi David > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ08: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> That's a good question. If the "hash" lacks version > >> control, > >>>>>>> it > >>>>>>>>>> could > >>>>>>>>>>>>>>>> trigger a series of unnecessary rebalances. However, > adding > >>>>>>>>>> additional > >>>>>>>>>>>>>>>> information ("magic") to the hash does not help the > upgraded > >>>>>>>>>> coordinator > >>>>>>>>>>>>>>>> determine the "version." This means that the upgraded > >>>>>>> coordinator > >>>>>>>>>> would > >>>>>>>>>>>>>>>> still trigger unnecessary rebalances because it has no way > >> to > >>>>>>> know > >>>>>>>>>> which > >>>>>>>>>>>>>>>> format to use when comparing the hash. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Perhaps we can add a new field to > >> ConsumerGroupMetadataValue > >>>>>>> to > >>>>>>>>>>>>>>>> indicate the version of the "hash." This would allow the > >>>>>>>>>> coordinator, > >>>>>>>>>>>>>> when > >>>>>>>>>>>>>>>> handling subscription metadata, to compute the old hash > and > >>>>>>>>>> determine > >>>>>>>>>>>>>>>> whether an epoch bump is necessary. Additionally, the > >>>>>>> coordinator > >>>>>>>>>> can > >>>>>>>>>>>>>>>> generate a new record to upgrade the hash without > requiring > >> an > >>>>>>>> epoch > >>>>>>>>>>>>>> bump. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Another issue is whether the coordinator should cache > all > >>>>>>>>>> versions of > >>>>>>>>>>>>>>>> the hash. I believe this is necessary; otherwise, during > an > >>>>>>>> upgrade, > >>>>>>>>>>>>>> there > >>>>>>>>>>>>>>>> would be extensive recomputing of old hashes. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I believe this idea should also work for downgrades, and > >>>>>>> that's > >>>>>>>>>> just > >>>>>>>>>>>>>>>> my two cents. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>> Chia-Ping > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 2024/12/19 14:39:41 David Jacot wrote: > >>>>>>>>>>>>>>>>>>> Hi PoAn and Chia-Ping, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks for your responses. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ02: Sorry, I was not clear. I was wondering whether > we > >>>>>>> could > >>>>>>>>>>>>>>>> compute the > >>>>>>>>>>>>>>>>>>> hash without having to convert to bytes before. Guava > >> has a > >>>>>>>> nice > >>>>>>>>>>>>>>>> interface > >>>>>>>>>>>>>>>>>>> for this allowing to incrementally add primitive types > >> to the > >>>>>>>>>> hash. > >>>>>>>>>>>>>>>> We can > >>>>>>>>>>>>>>>>>>> discuss this in the PR as it is an implementation > detail. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ03: Thanks. I don't think that the replicas are > updated > >>>>>>> when > >>>>>>>> a > >>>>>>>>>>>>>>>> broker > >>>>>>>>>>>>>>>>>>> shuts down. What you said applies to the ISR. I suppose > >> that > >>>>>>> we > >>>>>>>>>> can > >>>>>>>>>>>>>>>> rely on > >>>>>>>>>>>>>>>>>>> the ISR changes to trigger updates. It is also worth > >> noting > >>>>>>>>>>>>>>>>>>> that TopicsDelta#changedTopics is updated for every > >> change > >>>>>>>> (e.g. > >>>>>>>>>> ISR > >>>>>>>>>>>>>>>>>>> change, leader change, replicas change, etc.). I > suppose > >> that > >>>>>>>> it > >>>>>>>>>> is > >>>>>>>>>>>>>>>> OK but > >>>>>>>>>>>>>>>>>>> it seems that it will trigger refreshes which are not > >>>>>>>> necessary. > >>>>>>>>>>>>>>>> However, a > >>>>>>>>>>>>>>>>>>> rebalance won't be triggered because the hash won't > >> change. > >>>>>>>>>>>>>>>>>>> DJ03.1: I suppose that we will continue to rely on > >>>>>>>>>>>>>>>>>>> ModernGroup#requestMetadataRefresh to notify groups > that > >> must > >>>>>>>>>>>>>>>> refresh their > >>>>>>>>>>>>>>>>>>> hashes. Is my understanding correct? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ05: Fair enough. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ06: You mention in two places that you would like to > >>>>>>> combine > >>>>>>>>>>>>>>>> hashes by > >>>>>>>>>>>>>>>>>>> additioning them. I wonder if this is a good practice. > >>>>>>>>>> Intuitively, > >>>>>>>>>>>>>>>> I would > >>>>>>>>>>>>>>>>>>> have used XOR or hashed the hashed. Guava has a method > >> for > >>>>>>>>>> combining > >>>>>>>>>>>>>>>>>>> hashes. It may be worth looking into the algorithm > used. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ07: I would rename "AllTopicHash" to "MetadataHash" > in > >>>>>>> order > >>>>>>>>>> to be > >>>>>>>>>>>>>>>> more > >>>>>>>>>>>>>>>>>>> generic. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ08: Regarding the per topic hash, I wonder whether we > >>>>>>> should > >>>>>>>>>>>>>>>> precise in > >>>>>>>>>>>>>>>>>>> the KIP how we will compute it. I had the following in > >> mind: > >>>>>>>>>>>>>>>>>>> hash(topicName; numPartitions; [partitionId;sorted > >> racks]). > >>>>>>> We > >>>>>>>>>> could > >>>>>>>>>>>>>>>> also > >>>>>>>>>>>>>>>>>>> add a magic byte at the first element as a sort of > >> version. I > >>>>>>>> am > >>>>>>>>>> not > >>>>>>>>>>>>>>>> sure > >>>>>>>>>>>>>>>>>>> whether it is needed though. I was thinking about this > >> while > >>>>>>>>>>>>>>>> imagining how > >>>>>>>>>>>>>>>>>>> we would handle changing the format in the future. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ09: It would be great if we could provide more > details > >>>>>>> about > >>>>>>>>>>>>>>>> backward > >>>>>>>>>>>>>>>>>>> compatibility. What happens when the cluster is > upgraded > >> or > >>>>>>>>>>>>>>>> downgraded? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> DJ10: We should update KIP-1071. It may be worth > pigging > >> them > >>>>>>>> in > >>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> discussion thread of KIP-1071. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>> David > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Tue, Dec 17, 2024 at 9:25 AM PoAn Yang < > >>>>>>> yangp...@gmail.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Hi Chia-Ping / David / Andrew, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks for the review and suggestions. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> DJ01: Removed all implementation details. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> DJ02: Does the “incrementally” mean that we only > >> calculate > >>>>>>> the > >>>>>>>>>>>>>>>> difference > >>>>>>>>>>>>>>>>>>>> parts? > >>>>>>>>>>>>>>>>>>>> For example, if the number of partition change, we > only > >>>>>>>>>> calculate > >>>>>>>>>>>>>>>> the hash > >>>>>>>>>>>>>>>>>>>> of number of partition and reconstruct it to the topic > >> hash. > >>>>>>>>>>>>>>>>>>>> IMO, we only calculate topic hash one time. With cache > >>>>>>>>>> mechanism, > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> value can be reused in different groups on a same > >> broker. > >>>>>>>>>>>>>>>>>>>> The CPU usage for this part is not very high. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> DJ03: Added the update path to KIP for both cases. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> DJ04: Yes, it’s a good idea. With cache mechanism and > >> single > >>>>>>>>>> hash > >>>>>>>>>>>>>>>> per > >>>>>>>>>>>>>>>>>>>> group, we can balance cpu and disk usage. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> DJ05: Currently, the topic hash is only used in > >> coordinator. > >>>>>>>>>>>>>>>> However, the > >>>>>>>>>>>>>>>>>>>> metadata image is used in many different places. > >>>>>>>>>>>>>>>>>>>> How about we move the hash to metadata image when we > >> find > >>>>>>> more > >>>>>>>>>> use > >>>>>>>>>>>>>>>> cases? > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> AS1, AS2: Thanks for the reminder. I will simply > delete > >>>>>>>>>>>>>>>>>>>> ShareGroupPartitionMetadataKey/Value and add a new > >> field to > >>>>>>>>>>>>>>>>>>>> ShareGroupMetadataValue. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>> PoAn > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Dec 17, 2024, at 5:50 AM, Andrew Schofield < > >>>>>>>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hi PoAn, > >>>>>>>>>>>>>>>>>>>>> Thanks for the KIP. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> AS1: From the point of view of share groups, the API > >> and > >>>>>>>> record > >>>>>>>>>>>>>>>> schema > >>>>>>>>>>>>>>>>>>>>> definitions are unstable in AK 4.0. In AK 4.1, we > will > >>>>>>> start > >>>>>>>>>>>>>>>> supporting > >>>>>>>>>>>>>>>>>>>> proper > >>>>>>>>>>>>>>>>>>>>> versioning. As a result, I think you do not need to > >>>>>>> deprecate > >>>>>>>>>>>>>>>> the fields > >>>>>>>>>>>>>>>>>>>> in the > >>>>>>>>>>>>>>>>>>>>> ShareGroupPartitionMetadataValue. Just include the > >> schema > >>>>>>> for > >>>>>>>>>>>>>>>> the fields > >>>>>>>>>>>>>>>>>>>>> which are actually needed, and I'll update the schema > >> in > >>>>>>> the > >>>>>>>>>>>>>>>> code when > >>>>>>>>>>>>>>>>>>>>> the KIP is implemented. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> AS2: In the event that DJ04 actually removes the need > >> for > >>>>>>>>>>>>>>>>>>>>> ConsumerGroupPartitionMetadataKey/Value entirely, I > >> would > >>>>>>>>>> simply > >>>>>>>>>>>>>>>>>>>>> delete ShareGroupPartitionMetadataKey/Value, assuming > >> that > >>>>>>> it > >>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>> accepted in time for AK 4.1. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>> Andrew > >>>>>>>>>>>>>>>>>>>>> ________________________________________ > >>>>>>>>>>>>>>>>>>>>> From: Chia-Ping Tsai <chia7...@apache.org> > >>>>>>>>>>>>>>>>>>>>> Sent: 16 December 2024 16:27 > >>>>>>>>>>>>>>>>>>>>> To: dev@kafka.apache.org <dev@kafka.apache.org> > >>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS] KIP-1101: Trigger rebalance on > >> rack > >>>>>>>>>>>>>>>> topology > >>>>>>>>>>>>>>>>>>>> changes > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> hi David > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> DJ05 > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> One of the benefits of having a single hash per group > >>>>>>> (DJ04) > >>>>>>>> is > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> reduction in the size of stored data. Additionally, > the > >> cost > >>>>>>>> of > >>>>>>>>>>>>>>>>>>>> re-computing can be minimized thanks to caching. So > I'm > >> + 1 > >>>>>>> to > >>>>>>>>>>>>>>>> DJ04. > >>>>>>>>>>>>>>>>>>>> However, the advantage of storing the topic cache in > the > >>>>>>>>>> metadata > >>>>>>>>>>>>>>>> image is > >>>>>>>>>>>>>>>>>>>> somewhat unclear to me. Could you please provide more > >>>>>>> details > >>>>>>>> on > >>>>>>>>>>>>>>>> what you > >>>>>>>>>>>>>>>>>>>> mean by "tight"?Furthermore, since the metadata image > >> is a > >>>>>>>>>>>>>>>> thread-safe > >>>>>>>>>>>>>>>>>>>> object, we need to ensure that the lazy initialization > >> is > >>>>>>> also > >>>>>>>>>>>>>>>> thread-safe. > >>>>>>>>>>>>>>>>>>>> If no other components require the cache, it would be > >> better > >>>>>>>> to > >>>>>>>>>>>>>>>> keep the > >>>>>>>>>>>>>>>>>>>> caches within the coordinator. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>> Chia-Ping > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On 2024/12/16 14:01:35 David Jacot wrote: > >>>>>>>>>>>>>>>>>>>>>> Hi PoAn, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP. I have some comments about it. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> DJ01: Please, remove all the code from the KIP. We > >> only > >>>>>>> care > >>>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>>> public > >>>>>>>>>>>>>>>>>>>>>> interface changes, not about implementation details. > >>>>>>>>>>>>>>>>>>>>>> DJ02: Regarding the hash computation, I agree that > we > >>>>>>> should > >>>>>>>>>> use > >>>>>>>>>>>>>>>>>>>> Murmur3. > >>>>>>>>>>>>>>>>>>>>>> However, I don't quite like the implementation that > >> you > >>>>>>>>>> shared. > >>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>> wonder if > >>>>>>>>>>>>>>>>>>>>>> we could make it work incrementally instead of > >> computing a > >>>>>>>>>> hash > >>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>>> everything and combining them. > >>>>>>>>>>>>>>>>>>>>>> DJ03: Regarding the cache, my understanding is that > >> the > >>>>>>>> cache > >>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>> populated > >>>>>>>>>>>>>>>>>>>>>> when a topic without hash is seen in a HB request > and > >> the > >>>>>>>>>> cache > >>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>> cleaned > >>>>>>>>>>>>>>>>>>>>>> up when topics are deleted based on the metadata > >> image. > >>>>>>>>>>>>>>>> However, the > >>>>>>>>>>>>>>>>>>>> update > >>>>>>>>>>>>>>>>>>>>>> path is not clear. Let's say that a partition is > >> added to > >>>>>>> a > >>>>>>>>>>>>>>>> topic, how > >>>>>>>>>>>>>>>>>>>> does > >>>>>>>>>>>>>>>>>>>>>> it detect it? Let's also imagine that the racks of a > >>>>>>>> partition > >>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>>>>>> changed, how does it detect it? In the KIP, it would > >> be > >>>>>>> nice > >>>>>>>>>> to > >>>>>>>>>>>>>>>> be clear > >>>>>>>>>>>>>>>>>>>>>> about those. > >>>>>>>>>>>>>>>>>>>>>> DJ04: I wonder whether we should go with a single > >> hash per > >>>>>>>>>>>>>>>> group. Your > >>>>>>>>>>>>>>>>>>>>>> argument against it is that it would require to > >> re-compute > >>>>>>>> the > >>>>>>>>>>>>>>>> hash of > >>>>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>>>>> the topics when it needs to be computed. In my > >> opinion, we > >>>>>>>>>> could > >>>>>>>>>>>>>>>>>>>> leverage > >>>>>>>>>>>>>>>>>>>>>> the cached hash per topic to compute the hash of all > >> the > >>>>>>>>>>>>>>>> subscribed > >>>>>>>>>>>>>>>>>>>> ones. > >>>>>>>>>>>>>>>>>>>>>> We could basically combine all the hashes without > >> having > >>>>>>> to > >>>>>>>>>>>>>>>> compute all > >>>>>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>>> them. This approach has a few benefits. 1) We could > >> get > >>>>>>> rid > >>>>>>>> of > >>>>>>>>>>>>>>>>>>>>>> the ConsumerGroupPartitionMetadata record as we > could > >>>>>>> store > >>>>>>>>>> the > >>>>>>>>>>>>>>>> hash > >>>>>>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>>>>> the group epoch. 2) We could get rid of the Map that > >> we > >>>>>>> keep > >>>>>>>>>> in > >>>>>>>>>>>>>>>> each > >>>>>>>>>>>>>>>>>>>> group > >>>>>>>>>>>>>>>>>>>>>> to store the hashed corresponding to the subscribed > >>>>>>> topics. > >>>>>>>>>>>>>>>>>>>>>> DJ05: Regarding the cache again, I wonder if we > should > >>>>>>>>>> actually > >>>>>>>>>>>>>>>> store > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> hash in the metadata image instead of maintaining it > >>>>>>>> somewhere > >>>>>>>>>>>>>>>> else. We > >>>>>>>>>>>>>>>>>>>>>> could still lazily compute it. The benefit is that > the > >>>>>>> value > >>>>>>>>>>>>>>>> would be > >>>>>>>>>>>>>>>>>>>> tight > >>>>>>>>>>>>>>>>>>>>>> to the topic. I have not really looked into it. > Would > >> it > >>>>>>> be > >>>>>>>> an > >>>>>>>>>>>>>>>> option? > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I'll be away for two weeks starting from Saturday. I > >>>>>>> kindly > >>>>>>>>>> ask > >>>>>>>>>>>>>>>> you to > >>>>>>>>>>>>>>>>>>>> wait > >>>>>>>>>>>>>>>>>>>>>> on me if we cannot conclude this week. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>> David > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Tue, Nov 5, 2024 at 1:43 PM Frank Yang < > >>>>>>>> yangp...@gmail.com > >>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Hi Chia-Ping, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Thanks for the review and suggestions. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Q0: Add how rack change and how it affects topic > >>>>>>> partition. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Q1: Add why we need a balance algorithm to > Motivation > >>>>>>>>>> section. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Q2: After checking again, we don’t need to update > >> cache > >>>>>>>> when > >>>>>>>>>>>>>>>> we replay > >>>>>>>>>>>>>>>>>>>>>>> records. We only need to renew it in consumer > >> heartbeat. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Q3: Add a new section “Topic Hash Function”. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>>>>>>>>>> PoAn > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> On Nov 1, 2024, at 4:39 PM, Chia-Ping Tsai < > >>>>>>>>>>>>>>>> chia7...@gmail.com> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> hi PoAn > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for for this KIP! > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Q0: Could you add more details about `A topic > >> partition > >>>>>>>> has > >>>>>>>>>>>>>>>> rack > >>>>>>>>>>>>>>>>>>>> change`? > >>>>>>>>>>>>>>>>>>>>>>>> IIRC, the "rack change" includes both follower and > >>>>>>> leader, > >>>>>>>>>>>>>>>> right? > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Q1: Could you please add the 'concerns' we > >> discussed to > >>>>>>>> the > >>>>>>>>>>>>>>>> Motivation > >>>>>>>>>>>>>>>>>>>>>>>> section? This should include topics like > >> 'computations' > >>>>>>>> and > >>>>>>>>>>>>>>>> 'space > >>>>>>>>>>>>>>>>>>>>>>> usage'. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Q2: `The group coordinator can leverage it to add > a > >> new > >>>>>>>>>> topic > >>>>>>>>>>>>>>>>>>>> hash.`This > >>>>>>>>>>>>>>>>>>>>>>>> description seems a bit off to me. Why do we need > to > >>>>>>>> update > >>>>>>>>>>>>>>>> the cache > >>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>> this phase? The cache is intended to prevent > >> duplicate > >>>>>>>>>>>>>>>> computations > >>>>>>>>>>>>>>>>>>>>>>> caused > >>>>>>>>>>>>>>>>>>>>>>>> by heartbeat requests that occur between two > >> metadata > >>>>>>>> change > >>>>>>>>>>>>>>>> events. > >>>>>>>>>>>>>>>>>>>>>>>> Therefore, we could even remove the changed topics > >> from > >>>>>>>>>>>>>>>> caches on a > >>>>>>>>>>>>>>>>>>>>>>>> metadata change, as the first heartbeat request > >> would > >>>>>>>> update > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> caches > >>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>> all changed topics. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Q3: Could you please include a section about the > >> choice > >>>>>>> of > >>>>>>>>>>>>>>>> hash > >>>>>>>>>>>>>>>>>>>>>>>> implementation? The hash implementation must be > >>>>>>> consistent > >>>>>>>>>>>>>>>> across > >>>>>>>>>>>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>>>>>>>>> JDKs, so we use Murmur3 to generate the hash > value. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>> Chia-Ping > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Frank Yang <yangp...@gmail.com> 於 2024年11月1日 週五 > >>>>>>> 下午3:57寫道: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I would like to start a discussion thread on > >> KIP-1101. > >>>>>>>>>>>>>>>> Trigger > >>>>>>>>>>>>>>>>>>>> rebalance > >>>>>>>>>>>>>>>>>>>>>>>>> on rack topology changes. In this KIP, we aim to > >> use > >>>>>>> less > >>>>>>>>>>>>>>>> memory / > >>>>>>>>>>>>>>>>>>>> disk > >>>>>>>>>>>>>>>>>>>>>>>>> resources to detect rack changes in the new > >>>>>>> coordinator. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1101%3A+Trigger+rebalance+on+rack+topology+changes > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Please take a look and feel free to share any > >> thoughts. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>>>>>>>>>>>> PoAn > >