Hi All, Any suggestions, we are running into this issue in production and any help would be greatly appreciated.
Thanks On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <reachnavnee...@gmail.com> wrote: > Hi, > > Thanks for the response. We were using version 0.11 previously and all our > producers/consumers have been upgraded to either 1.0 or to the latest 2.3. > > Is it normal for the network thread to consume more cpu? If you look at > it, the network thread consumes 50% of the overall cpu. > > Regards > > On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges < > thunder.stump...@gmail.com> wrote: > >> Not sure what version your producers/consumers are, or if you upgraded >> from >> a previous version that used to work, or what, but maybe you're hitting >> this? >> >> >> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact >> >> >> >> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan < >> reachnavnee...@gmail.com> >> wrote: >> >> > Hi All, >> > >> > Any idea on what can be done? Not sure if we are running into this below >> > bug. >> > >> > https://issues.apache.org/jira/browse/KAFKA-7925 >> > >> > Thanks >> > >> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan < >> reachnavnee...@gmail.com> >> > wrote: >> > >> >> Hi All, >> >> >> >> We have a kafka cluster with 12 nodes and we are pretty much seeing 90% >> >> cpu usage on all the nodes. Here is all the information. Need some >> help on >> >> figuring out what the problem is and how to overcome this issue. >> >> >> >> *Cluster:* >> >> Kafka version: 2.3.0 >> >> Number of brokers in cluster: 12 >> >> Node type: 4 vCores 32GB mem >> >> Network In: 10Mbps per broker >> >> Network Out: 16Mbps per broker >> >> Topics: 10 (approximately) >> >> Partitions: 20 (Max), some has only partitions >> >> Replication Factor: 3 >> >> >> >> *CPU Usage:* >> >> [image: image.png] >> >> >> >> *VMStat* >> >> >> >> [root]# vmstat 1 10 >> >> >> >> procs -----------memory---------- ---swap-- -----io---- -system-- >> >> ------cpu----- >> >> >> >> r b swpd free buff cache si so bi bo in cs us sy >> >> id wa st >> >> >> >> 8 0 0 234444 19064 24046980 0 0 17 2026 1 3 38 >> 33 >> >> 28 0 1 >> >> >> >> 7 0 0 256444 19036 24023880 0 0 768 0 64027 22708 >> 44 >> >> 40 16 0 1 >> >> >> >> 7 0 0 245356 19052 24034560 0 0 256 472 63509 23276 >> 44 >> >> 39 17 0 1 >> >> >> >> 7 0 0 235096 19052 24046616 0 0 0 0 62277 22516 >> 46 >> >> 38 15 0 1 >> >> >> >> 8 0 0 260548 19036 24020084 0 0 516 49888 62364 22894 >> 43 >> >> 38 18 0 1 >> >> >> >> 5 0 0 249232 19036 24030924 0 0 512 0 61022 24589 >> 41 >> >> 39 20 0 1 >> >> >> >> 6 0 0 238072 19036 24042512 0 0 1024 0 63358 23063 >> 44 >> >> 38 17 0 0 >> >> >> >> 5 0 0 262904 19052 24017972 0 0 0 440 63078 23499 >> 46 >> >> 37 17 0 1 >> >> >> >> 7 0 0 250324 19052 24030008 0 0 0 0 64615 22617 >> 48 >> >> 38 14 0 1 >> >> >> >> 6 0 0 237920 19052 24042372 0 0 1024 48900 63223 23029 >> 42 >> >> 40 18 0 1 >> >> >> >> >> >> *IO Stat:* >> >> >> >> [root]# iostat -m >> >> >> >> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io) >> >> 01/02/2020 _x86_64_ (4 CPU) >> >> >> >> >> >> >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> >> >> >> 38.11 0.00 33.09 0.11 0.61 28.08 >> >> >> >> >> >> >> >> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn >> >> >> >> xvda 2.36 0.01 0.01 26760 43360 >> >> >> >> nvme0n1 0.00 0.00 0.00 2 0 >> >> >> >> xvdf 70.95 0.06 7.67 185908 25205338 >> >> >> >> *Top Kafka broker threads:* >> >> [image: image.png] >> >> >> >> *Top 3:* >> >> >> >> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0" >> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable >> >> [0x00007f8a886ce000] >> >> >> >> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2" >> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable >> >> [0x00007f8a6aefd000] >> >> >> >> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1" >> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable >> >> [0x00007f8a885cd000] >> >> >> >> It doesn't looks like GC and IO is the problem. >> >> >> >> Thanks >> >> >> > >> >