[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352290#comment-14352290 ]
Yasuhiro Matsuda commented on KAFKA-527: ---------------------------------------- >>This patch is mainly aimed at #1 above If you read the patch carefully, there are more for the compression part. It avoids copies to an intermediate buffer (byte array) when we do ByteArrayOutputStream to ByteBuffer, also a copy form ByteBuffer to ByteBuffer when we create a MessageSet from a Message at the end of compression. For the decompression part, your iterator patch looks nice. It seems to make ByteBufferSessageSet.decompress obsolete if you clean up all callers by using your iterator. > Compression support does numerous byte copies > --------------------------------------------- > > Key: KAFKA-527 > URL: https://issues.apache.org/jira/browse/KAFKA-527 > Project: Kafka > Issue Type: Bug > Components: compression > Reporter: Jay Kreps > Assignee: Yasuhiro Matsuda > Priority: Critical > Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, > java.hprof.no-compression.txt, java.hprof.snappy.text > > > The data path for compressing or decompressing messages is extremely > inefficient. We do something like 7 (?) complete copies of the data, often > for simple things like adding a 4 byte size to the front. I am not sure how > this went by unnoticed. > This is likely the root cause of the performance issues we saw in doing bulk > recompression of data in mirror maker. > The mismatch between the InputStream and OutputStream interfaces and the > Message/MessageSet interfaces which are based on byte buffers is the cause of > many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)