ilya morgenshtern created KAFKA-10901:
-----------------------------------------

             Summary: Lock contention on high produce rate causing cluster 
degregation
                 Key: KAFKA-10901
                 URL: https://issues.apache.org/jira/browse/KAFKA-10901
             Project: Kafka
          Issue Type: Bug
          Components: producer 
    Affects Versions: 2.5.0
         Environment: broker: version 2.5.0 with 8 cores,32gb, hdd
producer: Sarama producer version 1.5.2(go) with 500ms linger and 2mb batch-size
            Reporter: ilya morgenshtern
         Attachments: Screen Shot 2021-01-04 at 11.46.08.png, Screen Shot 
2021-01-04 at 11.46.47.png

scaling up (20 -> 40) producers causing idle percentage to drop from 70-80% 
into 0-1 %, the request queue size to increase by 200%, and overall producers 
latency increased by 700%.
also, the CPU usage dropped by 30%


after we ran some profiling we saw that there is high lock contention on the 
write requests but, CPU remained low, we didn't we any strange activity in the 
disk write/read/IOPS only the other way around because everything became slower 
the cluster processed much fewer data.

!Screen Shot 2021-01-04 at 11.46.47.png|width=576,height=23!

in comparison when there were 20 producers, you can see that the ratio of 
produce/fetch is

!Screen Shot 2021-01-04 at 11.46.08.png|width=567,height=31!

 

from limited observation, we saw the number of the produce request from this 
upscaled producer increased from 1500 to 2500 per sec, but overall produce 
request in the cluster remained the same, on the other hand, the number of 
fetch requests decreased by 50%

to fix the issue we increased this specific producer linger.ms from 500ms to 
1000ms and suddenly the whole cluster became healthy.
 



 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to