[ 
https://issues.apache.org/jira/browse/KAFKA-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939543#comment-13939543
 ] 

Guozhang Wang commented on KAFKA-1253:
--------------------------------------

Here is the experimental results for various heuristics to reduce the chance of 
reallocation. First the heuristics we compared:

0. Estimate compressed data written based on un-compressed data appended so far 
* compression rate; another estimate based on current compressed data written 
to underlying buffer + the compressor block size * compression rate. Take the 
smaller value.

1. Estimate based on current compressed data written to underlying buffer + the 
compressor block size * compression rate.

2. Estimate compressed data written based on un-compressed data appended so far 
* compression rate.

3. Estimated compressed data just as un-compressed data appended so far (i.e. 
assuming a compression rate of 1)

4. Estimated compressed data just as the data written in the underlying 
compressed buffer.

The first experiments are done with 10K random bit messages (which will cause a 
compression rate near 1), batch size = 16K, recorded #.reallocations. Note that 
each message append will at most trigger one reallocation:

GZIP:

|| message size || 1K || 10K || 100K ||
|heuristic0|0|0|10K|
|heuristic1|0|0|10K|
|heuristic2|0|0|10K|
|heuristic3|0|0|10K|
|heuristic4|4|4999|10K|

SNAPPY:

|| message size || 1K || 10K || 100K ||
|heuristic0|0|0|10K|
|heuristic1|0|0|10K|
|heuristic2|0|0|10K|
|heuristic3|0|0|10K|
|heuristic4|1|4993|10K|

> Implement compression in new producer
> -------------------------------------
>
>                 Key: KAFKA-1253
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1253
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: producer 
>            Reporter: Jay Kreps
>            Assignee: Guozhang Wang
>         Attachments: KAFKA-1253.patch, KAFKA-1253_2014-02-21_16:15:21.patch, 
> KAFKA-1253_2014-02-21_17:55:52.patch, KAFKA-1253_2014-02-24_13:31:50.patch, 
> KAFKA-1253_2014-02-26_17:31:30.patch, KAFKA-1253_2014-03-06_17:48:11.patch, 
> KAFKA-1253_2014-03-07_16:34:33.patch, KAFKA-1253_2014-03-10_14:35:56.patch, 
> KAFKA-1253_2014-03-10_14:39:58.patch, KAFKA-1253_2014-03-10_15:27:47.patch, 
> KAFKA-1253_2014-03-14_13:46:40.patch, KAFKA-1253_2014-03-14_17:39:53.patch, 
> KAFKA-1253_2014-03-17_15:56:04.patch, compression-fix.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to