[ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728575#comment-13728575
 ] 

Jay Kreps commented on KAFKA-527:
---------------------------------

The general idea of the refactoring would be to allow directly appending to a 
ByteBufferMessageSet. Perhaps we could add a static method 
Message.write(byteBuffer, key, value) and a ByteBufferMessageSet.append that 
works in-place.

The compression codec would likely need to change from an OutputStream and 
InputStream to something that worked directly with byte[].

This is straight-forward for snappy but requires a little more work for gzip to 
get the header right since I think they only provide array access from the more 
generic inflate/deflate codec (see Deflater.java and GZIPOutputStream.java in 
the jdk).
                
> Compression support does numerous byte copies
> ---------------------------------------------
>
>                 Key: KAFKA-527
>                 URL: https://issues.apache.org/jira/browse/KAFKA-527
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jay Kreps
>         Attachments: java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely 
> inefficient. We do something like 7 (?) complete copies of the data, often 
> for simple things like adding a 4 byte size to the front. I am not sure how 
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk 
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the 
> Message/MessageSet interfaces which are based on byte buffers is the cause of 
> many of these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to