li xiangyuan created KAFKA-9646: ----------------------------------- Summary: kafka consumer cause high cpu usage Key: KAFKA-9646 URL: https://issues.apache.org/jira/browse/KAFKA-9646 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 2.3.0 Environment: centos-7 3.10.0-957.21.3.el7.x86_64
Reporter: li xiangyuan Attachments: 0.10.0.1.svg, 2.4.0.svg, cpu_use Recently we upgrade kafka server from 0.10.0.1 to 2.3.0 successfully, and because kafka support fetch records from closest broker since 2.4.0, we decide to upgrade our client from 0.10.0.1 to 2.4.0 directly. After upgrade, we found some applications use much more cpu than before. The worst one up from 45% to 70%, therefore we have to rollback this application. we profile this application in test environment(each one execute 6 minutes), and get 2 kafka-clients version cpu flame graph. I have update these file. we found after upgrade to 2.4.0, select.selectNow cause highest cpu usage. this application subscribe 20 topics and each one has 6 consumer threads, and 19 topics has low produce speed (less than 1 message per mintute). we set fetch.max.wait.ms to 5000, cpu usage reduce little but still high then I write a test application, it subscribe 1 topic with 120 consumer threads. when use 2.4.0 client, cpu usage about to 40%. when use 0.10.0.1 ,cpu usage less than 10%. then I try to use 2.4.0 and modify org.apache.kafka.common.network.select , old code below: {code:java} if (timeoutMs == 0L) return this.nioSelector.selectNow(); else return this.nioSelector.select(timeoutMs);{code} change to {code:java} if (timeoutMs == 0) { timeoutMs = 1; } return this.nioSelector.select(timeoutMs); {code} after this change cpu usage about to 20%. i have upload cpu usage pic. i'm wondering why select.selectnow cause high cpu usage, maybe 2.4.0 client has to many useless select? or linux has some performance issue when multithread use selectnow concurrently? -- This message was sent by Atlassian Jira (v8.3.4#803005)