Re: 0.8.2.1 upgrade causes much more IO

Todd Snyder Tue, 11 Aug 2015 09:59:11 -0700

Hi Andrew,



Are you using Snappy Compression by chance?  When we tested the 0.8.2.1 upgrade 
initially we saw similar results and tracked it down to a problem with Snappy 
version 1.1.1.6 (https://issues.apache.org/jira/browse/KAFKA-2189).  We’re 
running with Snappy 1.1.1.7 now and the performance is back to where it used to 
be.


Sent from my BlackBerry 10 smartphone on the TELUS network.
From: Andrew Otto
Sent: Tuesday, August 11, 2015 12:26 PM
To: users@kafka.apache.org
Reply To: users@kafka.apache.org
Cc: Dan Andreescu; Joseph Allemandou
Subject: 0.8.2.1 upgrade causes much more IO


Hi all!

Yesterday I did a production upgrade of our 4 broker Kafka cluster from 0.8.1.1 
to 0.8.2.1.

When we did so, we were running our (varnishkafka) producers with 
request.required.acks = -1.  After switching to 0.8.2.1, producers saw produce 
response RTTs of >60 seconds.  I then switched to request.required.acks = 1, 
and producers settled down.  However, we then started seeing flapping ISRs 
about every 10 minutes.  We run Camus every 10 minutes.  If we disable Camus, 
then ISRs don’t flap.

All of these issues seem to be a side affect of a larger problem.  The total 
amount of network and disk IO that Kafka brokers are doing after the upgrade to 
0.8.2.1 has tripled.  We were previously seeing about 20 MB/s incoming on 
broker interfaces, 0.8.2.1 knocks this up to around 60 MB/s.  Disk writes have 
tripled accordingly.  Disk reads have also increased by a huge amount, although 
I suspect this is a consequence of more data flying around somehow dirtying the 
disk cache

You can see these changes in this dashboard: 
http://grafana.wikimedia.org/#/dashboard/db/kafka-0821-upgrade

The upgrade started at around 2015-08-10 14:30, and was completed on all 4 
brokers within a couple of hours.

Probably the most relevant is network rx_bytes on brokers.

[cid:099E3DC1-28F5-4BFC-A149-691DB87B01FD]


We looked at Kafka .log file sizes and noticed that file sizes are indeed much 
larger than they were before this upgrade:

# 0.8.1.1
2015-08-10T04 38119109383
2015-08-10T05 46172089174
2015-08-10T06 46172182745
2015-08-10T07 53151490032
2015-08-10T08 53151892928
2015-08-10T09 55836248198
2015-08-10T10 57984054557
2015-08-10T11 63353197416
2015-08-10T12 68184938548
2015-08-10T13 69259218741
2015-08-10T14 79567698089
# Upgrade to 0.8.2.1 starts here
2015-08-10T15 133643184876
2015-08-10T16 168515916825
2015-08-10T17 181394338213
2015-08-10T18 177097927553
2015-08-10T19 183530782549
2015-08-10T20 178706680082
2015-08-10T21 178712665924
2015-08-10T22 171741495606
2015-08-10T23 169049665348
2015-08-11T00 163682183241
2015-08-11T01 165292426510


Aside from the request.required.acks change I mentioned above, we haven’t made 
any config changes on brokers, producers, or consumers.  Our server.properties 
file is here: https://gist.github.com/ottomata/cdd270102287661c176a

Has anyone seen this before?  What could be the cause of more data here?  
Perhaps there is some compression config change that we missed that is causing 
this data to be sent or saved uncompressed?  (Sent uncompressed is unlikely, as 
we would probably notice a larger network change on the producers than we do.  
(Unless I’m looking at that wrong right now…:))  Is there a quick way to tell 
if the data is compressed?


Thanks!
-Andrew Otto


---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.

Re: 0.8.2.1 upgrade causes much more IO

Reply via email to