[ https://issues.apache.org/jira/browse/KAFKA-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939543#comment-13939543 ]
Guozhang Wang commented on KAFKA-1253: -------------------------------------- Here is the experimental results for various heuristics to reduce the chance of reallocation. First the heuristics we compared: 0. Estimate compressed data written based on un-compressed data appended so far * compression rate; another estimate based on current compressed data written to underlying buffer + the compressor block size * compression rate. Take the smaller value. 1. Estimate based on current compressed data written to underlying buffer + the compressor block size * compression rate. 2. Estimate compressed data written based on un-compressed data appended so far * compression rate. 3. Estimated compressed data just as un-compressed data appended so far (i.e. assuming a compression rate of 1) 4. Estimated compressed data just as the data written in the underlying compressed buffer. The first experiments are done with 10K random bit messages (which will cause a compression rate near 1), batch size = 16K, recorded #.reallocations. Note that each message append will at most trigger one reallocation: GZIP: || message size || 1K || 10K || 100K || |heuristic0|0|0|10K| |heuristic1|0|0|10K| |heuristic2|0|0|10K| |heuristic3|0|0|10K| |heuristic4|4|4999|10K| SNAPPY: || message size || 1K || 10K || 100K || |heuristic0|0|0|10K| |heuristic1|0|0|10K| |heuristic2|0|0|10K| |heuristic3|0|0|10K| |heuristic4|1|4993|10K| > Implement compression in new producer > ------------------------------------- > > Key: KAFKA-1253 > URL: https://issues.apache.org/jira/browse/KAFKA-1253 > Project: Kafka > Issue Type: Sub-task > Components: producer > Reporter: Jay Kreps > Assignee: Guozhang Wang > Attachments: KAFKA-1253.patch, KAFKA-1253_2014-02-21_16:15:21.patch, > KAFKA-1253_2014-02-21_17:55:52.patch, KAFKA-1253_2014-02-24_13:31:50.patch, > KAFKA-1253_2014-02-26_17:31:30.patch, KAFKA-1253_2014-03-06_17:48:11.patch, > KAFKA-1253_2014-03-07_16:34:33.patch, KAFKA-1253_2014-03-10_14:35:56.patch, > KAFKA-1253_2014-03-10_14:39:58.patch, KAFKA-1253_2014-03-10_15:27:47.patch, > KAFKA-1253_2014-03-14_13:46:40.patch, KAFKA-1253_2014-03-14_17:39:53.patch, > KAFKA-1253_2014-03-17_15:56:04.patch, compression-fix.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)