li xiangyuan created KAFKA-9646:
-----------------------------------

             Summary: kafka consumer cause high cpu usage
                 Key: KAFKA-9646
                 URL: https://issues.apache.org/jira/browse/KAFKA-9646
             Project: Kafka
          Issue Type: Improvement
          Components: clients
    Affects Versions: 2.3.0
         Environment: centos-7 3.10.0-957.21.3.el7.x86_64

            Reporter: li xiangyuan
         Attachments: 0.10.0.1.svg, 2.4.0.svg, cpu_use

Recently we upgrade kafka server from 0.10.0.1 to 2.3.0 successfully, and 
because kafka support fetch records from closest broker since 2.4.0, we decide 
to upgrade our client from 0.10.0.1 to 2.4.0 directly.

After upgrade, we found some applications use much more cpu than before. The 
worst one up from 45% to 70%, therefore we have to rollback this application.

we profile this application in test environment(each one execute 6 minutes), 
and get 2 kafka-clients version cpu flame graph. I have update these file.

we found after upgrade to 2.4.0, select.selectNow cause highest cpu usage. this 
application subscribe 20 topics and each one has 6 consumer threads, and 19 
topics has low produce speed (less than 1 message per mintute). we set 
fetch.max.wait.ms to 5000, cpu usage reduce little but still high

 

then I write a test application, it subscribe 1 topic with 120 consumer 
threads. when use 2.4.0 client, cpu usage about to 40%. when use 0.10.0.1 ,cpu 
usage less than 10%.

then I try to use 2.4.0 and modify org.apache.kafka.common.network.select , old 
code below:
{code:java}
if (timeoutMs == 0L)
           return this.nioSelector.selectNow();
       else
return this.nioSelector.select(timeoutMs);{code}
change to
{code:java}
if (timeoutMs == 0) {
            timeoutMs = 1;
        }
        return this.nioSelector.select(timeoutMs);
{code}
after this change cpu usage about to 20%. i have upload cpu usage pic.

i'm wondering why select.selectnow cause high cpu usage, maybe 2.4.0 client has 
to many useless select? or linux has some performance issue when multithread 
use selectnow concurrently?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to