Andrew, The recompression logic didn't change in 0.8.2.1. The broker still takes all messages in a single request, assigns offsets and recompresses them into a single compressed message.
Are you using mirror maker to copy data from the 0.8.1 cluster to the 0.8.2 cluster? If so, this may have to do with the batching in the producer in mirror maker. Did you enable the new java producer in mirror maker? Thanks, Jun On Mon, May 11, 2015 at 12:53 PM, Olson,Andrew <aols...@cerner.com> wrote: > After a recent 0.8.2.1 upgrade we noticed a significant increase in used > filesystem space for our Kafka log data. We have another Kafka cluster > still on 0.8.1.1 whose Kafka data is being copied over to the upgraded > cluster, and it is clear that the disk consumption is higher on 0.8.2.1 for > the same message data. The log retention config for the two clusters is the > same also. > > We ran some tests to figure out what was happening, and it appears that in > 0.8.2.1 the Kafka brokers re-compress each message individually (we’re > using Snappy), while in 0.8.1.1 they applied the compression across an > entire batch of messages written to the log. For producers sending large > batches of small similar messages, the difference can be quite substantial > (in our case, it looks like a little over 2x). Is this a bug, or the > expected new behavior? > > thanks, > Andrew > > CONFIDENTIALITY NOTICE This message and any included attachments are from > Cerner Corporation and are intended only for the addressee. The information > contained in this message is confidential and may constitute inside or > non-public information under international, federal, or state securities > laws. Unauthorized forwarding, printing, copying, distribution, or use of > such information is strictly prohibited and may be unlawful. If you are not > the addressee, please promptly delete this message and notify the sender of > the delivery error by e-mail or you may call Cerner's corporate offices in > Kansas City, Missouri, U.S.A at (+1) (816)221-1024. >