Point 2 may impact if the size of partitions is too big. too many log segments will cause those many iops I am not expert though
On Wed, 9 Aug 2023 at 6:43 PM, Tiansu Yu <tiansu...@klarna.com.invalid> wrote: > 1. We use cruise-control to actively balance the partitions across all > brokers. So point 1 could be ruled out. > 2. I am not sure how much this would impact the broker, as we do have some > exceptionally large partitions around. I have to check to know if they live > on the aforementioned broker. So far I don't see there is strong > correlation between total producer / consumer byte rates with CPU spikes on > this broker. > > Tiansu Yu > Engineer > Data Ingestion & Streaming > > Klarna Bank AB German Branch > Chausseestraße 117 > <https://www.google.com/maps/search/Chausseestra%C3%9Fe+117+10115+Berlin?entry=gmail&source=g> > 10115 Berlin > <https://www.google.com/maps/search/Chausseestra%C3%9Fe+117+10115+Berlin?entry=gmail&source=g> > Tel: +49 221 669 501 00 > klarna.de > > Klarna Bank AB, German Branch > Sitz: Berlin, Amtsgericht Charlottenburg HRB 217291 B > USt-Nr.: DE 815 867 324 > Zweigstelle der Klarna Bank AB (publ), AG schwedischen Rechts mit > Hauptsitz in Stockholm, > Schw. Gesellschaftsregister 556737-0431 > Verwaltungsratsvorsitzender: Michael Moritz > Geschäftsführender Direktor: Sebastian Siemiatkowski > Leiter Zweigniederlassung: Yaron Shaer, Björn Petersen > > On 9. Aug 2023, at 12:05, sunil chaudhari <sunilmchaudhar...@gmail.com> > wrote: > > Hi I can guess two problems here. > 1. Either too many partition’s concentrated on this broker compared to > other broker > 2. The partitions on this broker might have larger size as compared to the > parition on other brokers > > please chech if all brokers are evenly balanced in terms of number of > partitions and the total topic size on each broker. > > On Wed, 9 Aug 2023 at 1:29 PM, Tiansu Yu <tiansu...@klarna.com.invalid> > wrote: > > Hi Kafka community, > > We have an issue with our Kafka cluster from time to time, that a single > (one and only one) broker (leader) in the cluster reaches 100% CPU > utilisation. We could not see any apparent issue from the metrics. There is > no heap memory usage increase, no excessive connections made on the broker, > no misbehaving producers and consumers trying to dump or load excessively > during these periods. The only difference we could see is that thread usage > decreases during these period. Despite the problem, the service is still > available (understandable from Kafka's perspective.) > > We are trying to understand what else might be the cause of the issue and > how we can mitigate them. > > Tiansu Yu > Engineer > Data Ingestion & Streaming > > Klarna Bank AB German Branch > Chausseestraße 117 > <https://www.google.com/maps/search/Chausseestra%C3%9Fe+117?entry=gmail&source=g> > > < > https://www.google.com/maps/search/Chausseestra%C3%9Fe+117+10115+Berlin?entry=gmail&source=g > > > 10115 Berlin > < > https://www.google.com/maps/search/Chausseestra%C3%9Fe+117+10115+Berlin?entry=gmail&source=g > > > > > Tel: +49 221 669 501 00 > klarna.de > > Klarna Bank AB, German Branch > Sitz: Berlin, Amtsgericht Charlottenburg HRB 217291 B > USt-Nr.: DE 815 867 324 > Zweigstelle der Klarna Bank AB (publ), AG schwedischen Rechts mit > Hauptsitz in Stockholm, > Schw. Gesellschaftsregister 556737-0431 > Verwaltungsratsvorsitzender: Michael Moritz > Geschäftsführender Direktor: Sebastian Siemiatkowski > Leiter Zweigniederlassung: Yaron Shaer, Björn Petersen > > >