0.9.0.1 High CPU usage on broker - Why is the default heart beat interval set too low (3 seconds)?

Jaikiran Pai Tue, 26 Apr 2016 01:18:07 -0700

We have been investigating an unreasonably high CPU usage of the Kafkaprocess when there's no _real_ activity going on between the consumersand the broker. We had this issue in 0.8.x days and is exactly the sameas what's being tracked in this JIRAhttps://issues.apache.org/jira/browse/KAFKA-493. We now use 0.9.0.1(both client libraries, new consumer APIs and the broker). However, westill see some CPU usage which looks a bit on the higher side whenthere's no real message production or consumption going on. Justconnecting around 10-20 consumers on different topics of a single brokerKafka instance shows up this issue.

All our debugging so far points to the Processor thread on the brokerside which has a high CPU usage. There are N such Processor threads,which always are in the RUNNABLE state doing this:

"kafka-network-thread-0-PLAINTEXT-0" #21 prio=5 os_prio=0tid=0x00007f1858c4a800 nid=0xc81 runnable [0x00007f18106cb000]

   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000006c0046128> (a sun.nio.ch.Util$2)
    - locked <0x00000006c0046118> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000006c0046068> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.kafka.common.network.Selector.select(Selector.java:425)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:254)
    at kafka.network.Processor.run(SocketServer.scala:413)
    at java.lang.Thread.run(Thread.java:745)

From what we have narrowed down so far, this thread in itself isn't a"culprit", since when they are no consumers connected, the CPU isn'thigh. However when a consumer connects to this and just waits formessages, these threads start playing a role in the high CPU usage. Ourdebugging shows that each of these X number of consumers that connect tothe broker keep doing 2 things when they are "idle":

1) A delayed operation every Y seconds which does the auto commit ofoffsets.

2) Sending heartbeats every 3 seconds to the broker

We disabled auto commits of offsets since that's the semantic we wanted.So #1 isn't really an issue. However, #2 is. It looks like the defaultheartbeat interval is 3 seconds which is too low, IMO. This translatesto a network socket operation every 3 seconds which then has to beprocessed by the broker side Processor thread. If there's just a singleconsumer, this doesn't make much of a difference. As soon as you addmore consumers, the Processor on the broker side has to be startprocessing each of these incoming heartbeats which become too frequent.Even though the interval is 3 seconds, the incoming heartbeats to thebroker can be much more frequent when more consumers are involved sincethe 3 second interval is just per consumer. So in practice there can bea heartbeat coming every second or few milli seconds from the Xconsumers to this broker which can contribute to this high CPU usagewhen the system is practically idle.

So coming to the real question - why is the default heart beat intervalso low - 3 seconds? We increased it to 29 seconds (just 1 second lessthan the session timeout) per consumer (via consumer configs) and inaddition to disabling auto commit, these changes have improvednoticeably the CPU usage.

Ideally, what would be a better value for the heart beat interval thatdoesn't unnecessary flood these messages and cause the broker tocontinuous process them?


-Jaikiran

0.9.0.1 High CPU usage on broker - Why is the default heart beat interval set too low (3 seconds)?

Reply via email to