[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728575#comment-13728575 ]
Jay Kreps commented on KAFKA-527: --------------------------------- The general idea of the refactoring would be to allow directly appending to a ByteBufferMessageSet. Perhaps we could add a static method Message.write(byteBuffer, key, value) and a ByteBufferMessageSet.append that works in-place. The compression codec would likely need to change from an OutputStream and InputStream to something that worked directly with byte[]. This is straight-forward for snappy but requires a little more work for gzip to get the header right since I think they only provide array access from the more generic inflate/deflate codec (see Deflater.java and GZIPOutputStream.java in the jdk). > Compression support does numerous byte copies > --------------------------------------------- > > Key: KAFKA-527 > URL: https://issues.apache.org/jira/browse/KAFKA-527 > Project: Kafka > Issue Type: Bug > Reporter: Jay Kreps > Attachments: java.hprof.no-compression.txt, java.hprof.snappy.text > > > The data path for compressing or decompressing messages is extremely > inefficient. We do something like 7 (?) complete copies of the data, often > for simple things like adding a 4 byte size to the front. I am not sure how > this went by unnoticed. > This is likely the root cause of the performance issues we saw in doing bulk > recompression of data in mirror maker. > The mismatch between the InputStream and OutputStream interfaces and the > Message/MessageSet interfaces which are based on byte buffers is the cause of > many of these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira