Hi All,

We have a kafka cluster with 12 nodes and we are pretty much seeing 90% cpu
usage on all the nodes. Here is all the information. Need some help on
figuring out what the problem is and how to overcome this issue.

*Cluster:*
Kafka version: 2.3.0
Number of brokers in cluster: 12
Node type: 4 vCores 32GB mem
Network In: 10Mbps per broker
Network Out: 16Mbps per broker
Topics: 10 (approximately)
Partitions: 20 (Max), some has only partitions
Replication Factor: 3

*CPU Usage:*
[image: image.png]

*VMStat*

[root]# vmstat 1 10

procs -----------memory---------- ---swap-- -----io---- -system--
------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st

 8  0      0 234444  19064 24046980    0    0    17  2026    1    3 38 33
28  0  1

 7  0      0 256444  19036 24023880    0    0   768     0 64027 22708 44 40
16  0  1

 7  0      0 245356  19052 24034560    0    0   256   472 63509 23276 44 39
17  0  1

 7  0      0 235096  19052 24046616    0    0     0     0 62277 22516 46 38
15  0  1

 8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894 43 38
18  0  1

 5  0      0 249232  19036 24030924    0    0   512     0 61022 24589 41 39
20  0  1

 6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063 44 38
17  0  0

 5  0      0 262904  19052 24017972    0    0     0   440 63078 23499 46 37
17  0  1

 7  0      0 250324  19052 24030008    0    0     0     0 64615 22617 48 38
14  0  1

 6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029 42 40
18  0  1


*IO Stat:*

[root]# iostat -m

Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
01/02/2020        _x86_64_             (4 CPU)



avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          38.11    0.00   33.09    0.11    0.61   28.08



Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

xvda              2.36         0.01         0.01      26760      43360

nvme0n1           0.00         0.00         0.00          2          0

xvdf             70.95         0.06         7.67     185908   25205338

*Top Kafka broker threads:*
[image: image.png]

*Top 3:*

"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
#60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
[0x00007f8a886ce000]

"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
#62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
[0x00007f8a6aefd000]

"data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
#61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
[0x00007f8a885cd000]

It doesn't looks like GC and IO is the problem.

Thanks

Reply via email to