[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies

Guozhang Wang (JIRA) Fri, 06 Mar 2015 16:10:53 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351177#comment-14351177
 ]


Guozhang Wang commented on KAFKA-527:
-------------------------------------

Thanks for the patch, this is very promising.

There are a couple of issues we want to resolve here:

1. ByteArrayOutputStream copies data upon overflowing and resizing.

2. Compressed stream needs one extra copy upon finishing reading / writing.

This patch is mainly aimed at #1 above, and I have uploaded a patch for 
optimizing decompressed iterator, just as an example for resolving #2. In 
addition, I think in the end we will deprecate ByeBufferMessageSet and move to 
o.a.k.c.r.MemoryRecords, which will resolve both points above. We can discuss 
whether we want to incorporate these patches into ByeBufferMessageSet now or 
just wait for the migration and improve on o.a.k.c.r.MemoryRecords. 

For example, today MemoryRecords's write pattern is only for appending messages 
with pre-defined "records batch size", and try to close the batch when its size 
is approached; in ByteBufferMessageSet.create() we are given a set of messages 
without a predicated batch size, but it is still possible to get the value from 
the estimated compression ratio as we do in Compressor, such that in the worst 
case only one or two buffer expansions (i.e. data copies) are needed. Just is 
just an alternative to the linked-list buffers as proposed in this patch.

> Compression support does numerous byte copies
> ---------------------------------------------
>
>                 Key: KAFKA-527
>                 URL: https://issues.apache.org/jira/browse/KAFKA-527
>             Project: Kafka
>          Issue Type: Bug
>          Components: compression
>            Reporter: Jay Kreps
>            Assignee: Yasuhiro Matsuda
>            Priority: Critical
>         Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely 
> inefficient. We do something like 7 (?) complete copies of the data, often 
> for simple things like adding a 4 byte size to the front. I am not sure how 
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk 
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the 
> Message/MessageSet interfaces which are based on byte buffers is the cause of 
> many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies

Reply via email to