divijvaidya opened a new pull request, #13135: URL: https://github.com/apache/kafka/pull/13135
This covers two JIRAs https://issues.apache.org/jira/browse/KAFKA-14632 and https://issues.apache.org/jira/browse/KAFKA-14633 ## Background  Currently, we use 2 intermediate buffers while handling decompressed data (one of size 2KB and another of size 16KB). These buffers are (de)allocated once per batch. The impact of this was observed in a flamegraph analysis for a compressed workload where we observed that 75% of CPU during `appendAsLeader()` is taken up by `ValidateMessagesAndAssignOffsets`.  ## Change With this PR: 1. we are removing the number of intermediate buffers from 2 to 1. This reduces 1 point of data copy. Note that this removed data copy occurred in chunks of 2kb at a time, multiple times. This is achieved by getting rid of `BufferedInputStream` and moving to `DataInputStream`. This change has only been made for `zstd` and `gzip`. 2. we are using thread local buffer pool for both the buffers involved in the process of decompression. This change impacts all compression types. 3. we pushed the skipping of key/value logic to After the change, the above buffer allocation looks as follows:  ## Results After this change, a JMH benchmark for `ValidateMessagesAndAssignOffsets` demonstrated 10-50% increased throughput across all compression types without any regression. The improvement is prominent when thread cached buffer pools are used with 1-2% regression in some limited scenarios. When buffer pools are not used (NO_CACHING in the results), we observed GZIP having 10% better performance in some cases with 1-4% regression for some other scenarios. Overall, without using the buffer pools, the upside of this code change is limited to single digit improvements in certain scenarios. Details results from JMH benchmark are available here: [benchmark-jira.xlsx](https://github.com/apache/kafka/files/10465049/benchmark-jira.xlsx) ## Testing - Sanity testing using the existing unit test to ensure that we don't impact correctness. - JMH benchmarks for all compression types to ensure that we did not regress other compression types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org