Hi Jun,

I figured it out this morning and opened
https://issues.apache.org/jira/browse/KAFKA-2189 --
it turned out to be a bug in versions 1.1.1.2 through 1.1.1.6 of
snappy-java that has recently
been fixed (I was very happy to see their new unit test named
"batchingOfWritesShouldNotAffectCompressedDataSize"). We will be patching
1.1.1.7 out to our
clusters as soon as we can.

Regarding the mirror maker question, we wrote our own custom replication
code and are not
using the mirror maker to copy the data between clusters. We¹re still
using the old java
producer, and confirmed the issue was present with both the 0.8.1.1 and
0.8.2.1 old producer
client.

thanks,
Andrew

On 5/12/15, 3:08 PM, "Jun Rao" <j...@confluent.io> wrote:

>Andrew,
>
>The recompression logic didn't change in 0.8.2.1. The broker still takes
>all messages in a single request, assigns offsets and recompresses them
>into a single compressed message.
>
>Are you using mirror maker to copy data from the 0.8.1 cluster to the
>0.8.2
>cluster? If so, this may have to do with the batching in the producer in
>mirror maker. Did you enable the new java producer in mirror maker?
>
>Thanks,
>
>Jun
>
>
>On Mon, May 11, 2015 at 12:53 PM, Olson,Andrew <aols...@cerner.com> wrote:
>
>> After a recent 0.8.2.1 upgrade we noticed a significant increase in used
>> filesystem space for our Kafka log data. We have another Kafka cluster
>> still on 0.8.1.1 whose Kafka data is being copied over to the upgraded
>> cluster, and it is clear that the disk consumption is higher on 0.8.2.1
>>for
>> the same message data. The log retention config for the two clusters is
>>the
>> same also.
>>
>> We ran some tests to figure out what was happening, and it appears that
>>in
>> 0.8.2.1 the Kafka brokers re-compress each message individually (we¹re
>> using Snappy), while in 0.8.1.1 they applied the compression across an
>> entire batch of messages written to the log. For producers sending large
>> batches of small similar messages, the difference can be quite
>>substantial
>> (in our case, it looks like a little over 2x). Is this a bug, or the
>> expected new behavior?
>>
>> thanks,
>> Andrew
>>
>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>from
>> Cerner Corporation and are intended only for the addressee. The
>>information
>> contained in this message is confidential and may constitute inside or
>> non-public information under international, federal, or state securities
>> laws. Unauthorized forwarding, printing, copying, distribution, or use
>>of
>> such information is strictly prohibited and may be unlawful. If you are
>>not
>> the addressee, please promptly delete this message and notify the
>>sender of
>> the delivery error by e-mail or you may call Cerner's corporate offices
>>in
>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>

Reply via email to