I upgraded one of our Kafka clusters (9 nodes) from 0.8.2.3 to 0.9 following the instructions at http://kafka.apache.org/documentation.html#upgrade
Most things seem to work fine based on our metrics. Something I noticed is that the network out on 3 of the nodes goes up every 5-6 minutes. I see a corresponding increase in the Kafka bytes out metric too. I don't see any corresponding increase in the number of Kafka messages in or Kafka bytes in or even network in, so seems like something from inside Kafka is generating them. We don't know what could cause this behavior but it did not happen before the upgrade. Here is part of our Kafka dashboard before the upgrade: http://imgur.com/1nNM0f2 The above charts for a duration of an hour. You can see that the networking in/out (measured by Collectd) and Kafka bytes in/out (measured by the Kafka broker and extracted using JMX) are dependent on the messages in. Here is our Kafka dashboard after the upgrade: http://imgur.com/AjehzAD You can see that every 5 minutes both the Kafka bytes out metric(measured by the Kafka broker) and the network out metric(measured by Collectd) have a spike (5x maybe). Has this been seen before? I have verified through the broker logs that all of our brokers are running Kafka 0.9 with inter.broker.protocol.version = 0.9.0.0 Our producers and consumers are older ones including that use the Java wrapper around the SimpleConsumer. I also see messages (possibly unrelated) of this form in the broker logs: 2015-12-14T20:18:00.018Z INFO [kafka-request-handler-2 ] [kafka.server.KafkaApis ]: [KafkaApi-6] Close connection due to error handling produce request with correlation id 294218 from client id with ack=0 It doesn't give me any details on what the error was exactly and I don't recall seeing them before the upgrade. So the problem (periodic burst of network out/ kafka bytes out) only happens on 3 out of the 9 brokers that we upgraded. The CPU on the 3 brokers that exhibit the problem has also gone up: https://imgur.com/nezPmQ6 Thanks, Rajiv