[ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352290#comment-14352290
 ] 

Yasuhiro Matsuda commented on KAFKA-527:
----------------------------------------

>>This patch is mainly aimed at #1 above

If you read the patch carefully, there are more for the compression part. It 
avoids copies to an intermediate buffer (byte array) when we do 
ByteArrayOutputStream to ByteBuffer, also a copy form ByteBuffer to ByteBuffer 
when we create a MessageSet from a Message at the end of compression.

For the decompression part, your iterator patch looks nice. It seems to make 
ByteBufferSessageSet.decompress obsolete if you clean up all callers by using 
your iterator.


> Compression support does numerous byte copies
> ---------------------------------------------
>
>                 Key: KAFKA-527
>                 URL: https://issues.apache.org/jira/browse/KAFKA-527
>             Project: Kafka
>          Issue Type: Bug
>          Components: compression
>            Reporter: Jay Kreps
>            Assignee: Yasuhiro Matsuda
>            Priority: Critical
>         Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely 
> inefficient. We do something like 7 (?) complete copies of the data, often 
> for simple things like adding a 4 byte size to the front. I am not sure how 
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk 
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the 
> Message/MessageSet interfaces which are based on byte buffers is the cause of 
> many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to