Re: [PR] KAFKA-17747: [2/N] Add compute topic and group hash [kafka]

via GitHub Tue, 29 Apr 2025 06:10:02 -0700


FrankYang0529 commented on code in PR #19523:
URL: https://github.com/apache/kafka/pull/19523#discussion_r2066325978



##########
gradle/dependencies.gradle:
##########
@@ -147,6 +148,7 @@ libs += [
   caffeine: "com.github.ben-manes.caffeine:caffeine:$versions.caffeine",
   classgraph: "io.github.classgraph:classgraph:$versions.classgraph",
   commonsValidator: 
"commons-validator:commons-validator:$versions.commonsValidator",
+  guava: "com.google.guava:guava:$versions.guava",

Review Comment:
   > I also wonder what is the impact of putting all the data to a byte array 
before hashing it. Do you have thoughts on this?
   
   Based on the KIP-1101, it minimizes the calculation count of topic hash. The 
result can be shared between groups. I think we can keep this function simple 
currently.
   
   > I suggest that EventProcessorThread can leverage GrowableBufferSupplier to 
reuse buffer as much as possible. 
   
   With BufferSupplier, the hash function needs to be thread safe to reuse the 
buffer. We can revisit it in the future.
   
   > Additionally, Group#computeTopicHashin should use ByteBufferOutputStream 
to generate the bytes array, as ByteBufferOutputStream#buffer#array can avoid 
extra array copy like ByteArrayOutputStream#toByteArray
   
   The ByteBufferOutputStream needs a ByteBuffer with capacity. I wonder 
whether we can calculate a accurate capacity. For example, rack string can 
contain any character. It presents 1 to 4 bytes in UTF-8.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-17747: [2/N] Add compute topic and group hash [kafka]

Reply via email to