Hello,

We are running kafka 0.9.0.1 and using the simple consumer to consume from a topic with 8 partitions. The consumer JVM infrequently runs into large gc pauses (60s to 90s, stop the world gc). These gcs are unrelated to kafka. We are usually consuming at 5k messages/sec on the topic. Right after the garbage collection finishes, the consumption doesn't resume its original rate until after about 9 to 10mn. The recovery is gradual, it slowly goes back to its steady rate.

I've enabled DEBUG logs on the kafka client and noticed the following happening right after the gc pause finished 2017-02-10T03:21:30.932Z DEBUG [o.a.kafka9.clients.NetworkClient ] {}: Disconnecting from node 2 due to request timeout. 2017-02-10T03:21:30.943Z DEBUG [o.a.kafka9.clients.NetworkClient ] {}: Disconnecting from node 3 due to request timeout. 2017-02-10T03:21:30.943Z DEBUG [o.a.kafka9.clients.NetworkClient ] {}: Disconnecting from node 4 due to request timeout. 2017-02-10T03:21:30.944Z DEBUG [o.a.kafka9.clients.NetworkClient ] {}: Disconnecting from node 5 due to request timeout.

There were also logs about Fetch failures due to the disconnects. Also frequently occurring "Marking the coordinator 2147483639 dead", which I think is a red herring since we are not using the high level consumer/consumer groups.

After looking at the client code, I found out that the client disconnects from a node if any of its its requests linger for more than requestTimeoutMs, which is 40s by default. Since the underlying socket connections may still be healthy even after a 90s gc pause, I went ahead and changed that timeout to 2mn with the hope that it will keep the nodes from being disconnected (which I thought was the root cause of the slow recovery). However, that did not help either. With the new timeout, I don't see the disconnects happening anymore in the logs after the gc pause, however the recovery still has the same slow pattern which is puzzling.

Has anyone encountered something like this ? Any help is much appreciated.

Thanks.

--
Mahdi.

Reply via email to