li xiangyuan created KAFKA-9646:
-----------------------------------
Summary: kafka consumer cause high cpu usage
Key: KAFKA-9646
URL: https://issues.apache.org/jira/browse/KAFKA-9646
Project: Kafka
Issue Type: Improvement
Components: clients
Affects Versions: 2.3.0
Environment: centos-7 3.10.0-957.21.3.el7.x86_64
Reporter: li xiangyuan
Attachments: 0.10.0.1.svg, 2.4.0.svg, cpu_use
Recently we upgrade kafka server from 0.10.0.1 to 2.3.0 successfully, and
because kafka support fetch records from closest broker since 2.4.0, we decide
to upgrade our client from 0.10.0.1 to 2.4.0 directly.
After upgrade, we found some applications use much more cpu than before. The
worst one up from 45% to 70%, therefore we have to rollback this application.
we profile this application in test environment(each one execute 6 minutes),
and get 2 kafka-clients version cpu flame graph. I have update these file.
we found after upgrade to 2.4.0, select.selectNow cause highest cpu usage. this
application subscribe 20 topics and each one has 6 consumer threads, and 19
topics has low produce speed (less than 1 message per mintute). we set
fetch.max.wait.ms to 5000, cpu usage reduce little but still high
then I write a test application, it subscribe 1 topic with 120 consumer
threads. when use 2.4.0 client, cpu usage about to 40%. when use 0.10.0.1 ,cpu
usage less than 10%.
then I try to use 2.4.0 and modify org.apache.kafka.common.network.select , old
code below:
{code:java}
if (timeoutMs == 0L)
return this.nioSelector.selectNow();
else
return this.nioSelector.select(timeoutMs);{code}
change to
{code:java}
if (timeoutMs == 0) {
timeoutMs = 1;
}
return this.nioSelector.select(timeoutMs);
{code}
after this change cpu usage about to 20%. i have upload cpu usage pic.
i'm wondering why select.selectnow cause high cpu usage, maybe 2.4.0 client has
to many useless select? or linux has some performance issue when multithread
use selectnow concurrently?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)