I upgraded one of our Kafka clusters (9 nodes) from 0.8.2.3 to 0.9
following the instructions at
http://kafka.apache.org/documentation.html#upgrade

Most things seem to work fine based on our metrics. Something I noticed is
that the network out on 3 of the nodes goes up every 5-6 minutes. I see a
corresponding increase in the Kafka bytes out metric too.

I don't see any corresponding increase in the number of Kafka messages in
or Kafka bytes in or even network in, so seems like something from
inside Kafka is generating them. We don't know what could cause this
behavior but it did not happen before the upgrade.

Here is part of our Kafka dashboard before the upgrade:
http://imgur.com/1nNM0f2

The above charts for a duration of an hour. You can see that the networking
in/out (measured by Collectd) and Kafka bytes in/out (measured by
the Kafka broker and extracted using JMX) are dependent on the messages in.


Here is our Kafka dashboard after the upgrade:
http://imgur.com/AjehzAD

You can see that every 5 minutes both the Kafka bytes out metric(measured
by the Kafka broker) and the network out metric(measured by Collectd) have
a spike (5x maybe). Has this been seen before?

I have verified through the broker logs that all of our brokers are
running Kafka 0.9 with inter.broker.protocol.version = 0.9.0.0

Our producers and consumers are older ones including that use the Java
wrapper around the SimpleConsumer.

I also see messages (possibly unrelated) of this form in the broker logs:

2015-12-14T20:18:00.018Z INFO  [kafka-request-handler-2            ]
[kafka.server.KafkaApis              ]: [KafkaApi-6] Close connection due
to error handling produce request with correlation id 294218 from client
id  with ack=0

It doesn't give me any details on what the error was exactly and I don't
recall seeing them before the upgrade.


So the problem (periodic burst of network out/ kafka bytes out) only
happens on 3 out of the 9 brokers that we upgraded. The CPU on the 3
brokers that exhibit the problem has also gone up:

https://imgur.com/nezPmQ6


Thanks,

Rajiv

Reply via email to