Hi Andrew,

I work with Todd and did our 0.8.2.1 testing with him.  I believe that the 
Kafka 0.8.x brokers recompresses the messages once it receives them in, order 
to assign the offsets to the messages (see the 'Compression in Kafka' section 
of: http://nehanarkhede.com/2013/03/28/compression-in-kafka-gzip-or-snappy/). I 
expect that you will see an improvement with Snappy 1.1.1.7  (FWIW, our load 
generator's version of Snappy didn't change between our 0.8.1.1 and 0.8.2.1 
testing, and we still saw the IO hit on the broker side, which seems to confirm 
this).

Thanks,
Matt Bruce


From: Andrew Otto [mailto:ao...@wikimedia.org]
Sent: Tuesday, August 11, 2015 3:15 PM
To: users@kafka.apache.org
Cc: Dan Andreescu <dandree...@wikimedia.org>; Joseph Allemandou 
<jalleman...@wikimedia.org>
Subject: Re: 0.8.2.1 upgrade causes much more IO

Hi Todd,

We are using snappy!  And we are using version 1.1.1.6 as of our upgrade to 
0.8.2.1 yesterday.  However, as far as I can tell, that is only relevant for 
Java producers, right?   Our main producers use librdkafka (the Kafka C lib) to 
produce, and in doing so use a built in C version of snappy[1].

Even so, your issue sounds very similar to mine, and I don't have a full 
understanding of how brokers deal with compression, so I have updated the 
snappy java version to 1.1.1.7 on one of our brokers.  We'll have to wait a 
while to see if the log sizes are actually smaller for data written to this 
broker.

Thanks!




[1] https://github.com/edenhill/librdkafka/blob/0.8.5/src/snappy.c
On Aug 11, 2015, at 12:58, Todd Snyder 
<tsny...@blackberry.com<mailto:tsny...@blackberry.com>> wrote:

Hi Andrew,

Are you using Snappy Compression by chance?  When we tested the 0.8.2.1 upgrade 
initially we saw similar results and tracked it down to a problem with Snappy 
version 1.1.1.6 (https://issues.apache.org/jira/browse/KAFKA-2189).  We're 
running with Snappy 1.1.1.7 now and the performance is back to where it used to 
be.



Sent from my BlackBerry 10 smartphone on the TELUS network.
From: Andrew Otto
Sent: Tuesday, August 11, 2015 12:26 PM
To: users@kafka.apache.org<mailto:users@kafka.apache.org>
Reply To: users@kafka.apache.org<mailto:users@kafka.apache.org>
Cc: Dan Andreescu; Joseph Allemandou
Subject: 0.8.2.1 upgrade causes much more IO


Hi all!

Yesterday I did a production upgrade of our 4 broker Kafka cluster from 0.8.1.1 
to 0.8.2.1.

When we did so, we were running our (varnishkafka) producers with 
request.required.acks = -1.  After switching to 0.8.2.1, producers saw produce 
response RTTs of >60 seconds.  I then switched to request.required.acks = 1, 
and producers settled down.  However, we then started seeing flapping ISRs 
about every 10 minutes.  We run Camus every 10 minutes.  If we disable Camus, 
then ISRs don't flap.

All of these issues seem to be a side affect of a larger problem.  The total 
amount of network and disk IO that Kafka brokers are doing after the upgrade to 
0.8.2.1 has tripled.  We were previously seeing about 20 MB/s incoming on 
broker interfaces, 0.8.2.1 knocks this up to around 60 MB/s.  Disk writes have 
tripled accordingly.  Disk reads have also increased by a huge amount, although 
I suspect this is a consequence of more data flying around somehow dirtying the 
disk cache

You can see these changes in this dashboard: 
http://grafana.wikimedia.org/#/dashboard/db/kafka-0821-upgrade

The upgrade started at around 2015-08-10 14:30, and was completed on all 4 
brokers within a couple of hours.

Probably the most relevant is network rx_bytes on brokers.

[cid:image001.png@01D0D44C.7A1DF110]


We looked at Kafka .log file sizes and noticed that file sizes are indeed much 
larger than they were before this upgrade:

# 0.8.1.1
2015-08-10T04 38119109383
2015-08-10T05 46172089174
2015-08-10T06 46172182745
2015-08-10T07 53151490032
2015-08-10T08 53151892928
2015-08-10T09 55836248198
2015-08-10T10 57984054557
2015-08-10T11 63353197416
2015-08-10T12 68184938548
2015-08-10T13 69259218741
2015-08-10T14 79567698089
# Upgrade to 0.8.2.1 starts here
2015-08-10T15 133643184876
2015-08-10T16 168515916825
2015-08-10T17 181394338213
2015-08-10T18 177097927553
2015-08-10T19 183530782549
2015-08-10T20 178706680082
2015-08-10T21 178712665924
2015-08-10T22 171741495606
2015-08-10T23 169049665348
2015-08-11T00 163682183241
2015-08-11T01 165292426510


Aside from the request.required.acks change I mentioned above, we haven't made 
any config changes on brokers, producers, or consumers.  Our server.properties 
file is here: https://gist.github.com/ottomata/cdd270102287661c176a

Has anyone seen this before?  What could be the cause of more data here?  
Perhaps there is some compression config change that we missed that is causing 
this data to be sent or saved uncompressed?  (Sent uncompressed is unlikely, as 
we would probably notice a larger network change on the producers than we do.  
(Unless I'm looking at that wrong right now...:))  Is there a quick way to tell 
if the data is compressed?


Thanks!
-Andrew Otto


---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.

Reply via email to