[ https://issues.apache.org/jira/browse/KAFKA-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010835#comment-17010835 ]
ASF GitHub Bot commented on KAFKA-9387: --------------------------------------- CodingFabian commented on pull request #7910: KAFKA-9387: Use non JNI LZ4 Hashing for header checksums, URL: https://github.com/apache/kafka/pull/7910 since they are just a few bytes and the Java Version outperforms the native version see: https://lz4.github.io/lz4-java/1.3.0/xxhash-benchmark/ ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > LZ4 Compression creates significant unnecessary CPU usage > --------------------------------------------------------- > > Key: KAFKA-9387 > URL: https://issues.apache.org/jira/browse/KAFKA-9387 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 2.4.0 > Reporter: Fabian Lange > Priority: Major > Attachments: Screenshot 2020-01-08 at 16.52.38.png > > > KafkaLZ4BlockOutputStream and KafkaLZ4BlockInputStream perform checksumming > on 3 bytes in the header. This is potentially quite unnecessary, but this > ticket proposes a solution to improve the performance 10x. > {{kafka-downstream-0 id=152 state=RUNNABLE > at net.jpountz.xxhash.XXHashJNI.XXH32(Native Method) > at net.jpountz.xxhash.XXHash32JNI.hash(XXHash32JNI.java:30) > at > org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.writeHeader(KafkaLZ4BlockOutputStream.java:156) > at > org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.<init>(KafkaLZ4BlockOutputStream.java:85) > at > org.apache.kafka.common.record.KafkaLZ4BlockOutputStream.<init>(KafkaLZ4BlockOutputStream.java:125) > at > org.apache.kafka.common.record.CompressionType$4.wrapForOutput(CompressionType.java:101) > at > org.apache.kafka.common.record.MemoryRecordsBuilder.<init>(MemoryRecordsBuilder.java:130) > at > org.apache.kafka.common.record.MemoryRecordsBuilder.<init>(MemoryRecordsBuilder.java:166) > at > org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:534) > at > org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:516) > at > org.apache.kafka.common.record.MemoryRecords.builder(MemoryRecords.java:464) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.recordsBuilder(RecordAccumulator.java:245) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:222) > at > org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:917) > at > org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:856) > at > org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:743)}} > by default Kafka doesn't do checksumming on blocks (blockChecksum=false) > but it does doe checksumming on the header > The header however is static, so its checksumming the same 6 or 2 bytes over > and over again. > Currently it uses the {{XXHashFactory.fastestInstance().hash32()}} > but this will be a JNI one. > For 2 bytes however, this is 10x slower than the java one, so we should > replace it with {{fastestJavaInstance}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)