[ https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797927#comment-16797927 ]
Flower.min commented on KAFKA-8106: ----------------------------------- _*We only don't decompress key and value of a record, it still can validate the offset and timestamp.*_ When all of the following conditions are met, we recommend removing unnecessary decompression operations when validating compressed messages: *1. The compression type of Kafka servers and topic is consistent with Kafka producer.* *2. The magic value to use is above 1* *3. No format conversion and value overwriting is required for messages compressed.* > Remove unnecessary decompression operation when logValidator do validation. > ---------------------------------------------------------------------------- > > Key: KAFKA-8106 > URL: https://issues.apache.org/jira/browse/KAFKA-8106 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.1.1 > Environment: Server : > cpu:2*16 ; > MemTotal : 256G; > Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network > Connection ; > SSD. > Reporter: Flower.min > Assignee: Flower.min > Priority: Major > Labels: performance > > We do performance testing about Kafka in specific scenarios as > described below .We build a kafka cluster with one broker,and create topics > with different number of partitions.Then we start lots of producer processes > to send large amounts of messages to one of the topics at one testing . > *_Specific Scenario_* > > *_1.Main config of Kafka_* > # Main config of Kafka > server:[num.network.threads=6;num.io.threads=128;queued.max.requests=500|http://num.network.threads%3D6%3Bnum.io.threads%3D128%3Bqueued.max.requests%3D500/] > # Number of TopicPartition : 50~2000 > # Size of Single Message : 1024B > > *_2.Config of KafkaProducer_* > ||compression.type||[linger.ms|http://linger.ms/]||batch.size||buffer.memory|| > |lz4|1000ms~5000ms|16KB/10KB/100KB|128MB| > *_3.The best result of performance testing_* > ||Network inflow rate||CPU Used (%)||Disk write speed||Performance of > production|| > |550MB/s~610MB/s|97%~99%|:550MB/s~610MB/s |23,000,000 messages/s| > *_4.Phenomenon and my doubt_* > _The upper limit of CPU usage has been reached But it does not > reach the upper limit of the bandwidth of the server network. *We are > doubtful about which cost too much CPU time and we want to Improve > performance and reduces CPU usage of Kafka server.*_ > > _*5.Analysis*_ > We analysis the JFIR of Kafka server when doing performance testing > .We found the hot spot method is > *_"java.io.DataInputStream.readFully(byte[],int,int)"_* and > *_"org.apache.kafka.common.record.KafkaLZ4BlockInputStream.read(byte[],int,int)"_*.When > we checking thread stack information we also have found most CPU being > occupied by lots of thread which is busy decompressing messages.Then we > read source code of Kafka . > There is double-layer nested Iterator when LogValidator do validate > every record.And There is a decompression for each message when traversing > every RecordBatch iterator. It is consuming CPU and affect total performance > that decompress message._*The purpose of decompressing every messages just > for gain total size in bytes of one record and size in bytes of record body > when magic value to use is above 1 and no format conversion or value > overwriting is required for compressed messages.It is negative for > performance in common usage scenarios .*_{color:#333333}Therefore, we suggest > that *_removing unnecessary decompression operation_* when doing validation > for compressed message when magic value to use is above 1 and no format > conversion or value overwriting is required for compressed messages.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)