Hi Navneeth

like the bug you said above,  do you set sun.security.jgss.native = true?

if not, there are some items need to be check.

1. GC, but you say gc is not problem
2. if you suspect network thread, how many thread did you set?
3. if you enable compression
4. did you change the value of batch.size at producer side?
5. do you think you can increase "fetch.min,bytes" at consumer side and
"replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
6. you can check some metrics from jmx to analysis, e.g. checking
"kafka.network:type=RequestMetrics,
name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
valus is high , that means cpu will be busy.

Best,
Lisheng


Navneeth Krishnan <reachnavnee...@gmail.com> 于2020年1月8日周三 下午3:39写道:

> Hi All,
>
> Any suggestions, we are running into this issue in production and any
> help would be greatly appreciated.
>
> Thanks
>
> On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <reachnavnee...@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > Thanks for the response. We were using version 0.11 previously and all
> our
> > producers/consumers have been upgraded to either 1.0 or to the latest
> 2.3.
> >
> > Is it normal for the network thread to consume more cpu? If you look at
> > it, the network thread consumes 50% of the overall cpu.
> >
> > Regards
> >
> > On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <
> > thunder.stump...@gmail.com> wrote:
> >
> >> Not sure what version your producers/consumers are, or if you upgraded
> >> from
> >> a previous version that used to work, or what, but maybe you're hitting
> >> this?
> >>
> >>
> >>
> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
> >>
> >>
> >>
> >> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
> >> reachnavnee...@gmail.com>
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > Any idea on what can be done? Not sure if we are running into this
> below
> >> > bug.
> >> >
> >> > https://issues.apache.org/jira/browse/KAFKA-7925
> >> >
> >> > Thanks
> >> >
> >> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> >> reachnavnee...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> We have a kafka cluster with 12 nodes and we are pretty much seeing
> 90%
> >> >> cpu usage on all the nodes. Here is all the information. Need some
> >> help on
> >> >> figuring out what the problem is and how to overcome this issue.
> >> >>
> >> >> *Cluster:*
> >> >> Kafka version: 2.3.0
> >> >> Number of brokers in cluster: 12
> >> >> Node type: 4 vCores 32GB mem
> >> >> Network In: 10Mbps per broker
> >> >> Network Out: 16Mbps per broker
> >> >> Topics: 10 (approximately)
> >> >> Partitions: 20 (Max), some has only partitions
> >> >> Replication Factor: 3
> >> >>
> >> >> *CPU Usage:*
> >> >> [image: image.png]
> >> >>
> >> >> *VMStat*
> >> >>
> >> >> [root]# vmstat 1 10
> >> >>
> >> >> procs -----------memory---------- ---swap-- -----io---- -system--
> >> >> ------cpu-----
> >> >>
> >> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy
> >> >> id wa st
> >> >>
> >> >>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3
> 38
> >> 33
> >> >> 28  0  1
> >> >>
> >> >>  7  0      0 256444  19036 24023880    0    0   768     0 64027 22708
> >> 44
> >> >> 40 16  0  1
> >> >>
> >> >>  7  0      0 245356  19052 24034560    0    0   256   472 63509 23276
> >> 44
> >> >> 39 17  0  1
> >> >>
> >> >>  7  0      0 235096  19052 24046616    0    0     0     0 62277 22516
> >> 46
> >> >> 38 15  0  1
> >> >>
> >> >>  8  0      0 260548  19036 24020084    0    0   516 49888 62364 22894
> >> 43
> >> >> 38 18  0  1
> >> >>
> >> >>  5  0      0 249232  19036 24030924    0    0   512     0 61022 24589
> >> 41
> >> >> 39 20  0  1
> >> >>
> >> >>  6  0      0 238072  19036 24042512    0    0  1024     0 63358 23063
> >> 44
> >> >> 38 17  0  0
> >> >>
> >> >>  5  0      0 262904  19052 24017972    0    0     0   440 63078 23499
> >> 46
> >> >> 37 17  0  1
> >> >>
> >> >>  7  0      0 250324  19052 24030008    0    0     0     0 64615 22617
> >> 48
> >> >> 38 14  0  1
> >> >>
> >> >>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223 23029
> >> 42
> >> >> 40 18  0  1
> >> >>
> >> >>
> >> >> *IO Stat:*
> >> >>
> >> >> [root]# iostat -m
> >> >>
> >> >> Linux 4.14.72-73.55.amzn2.x86_64 (loc-kafka11.internal.dnaspaces.io)
> >> >> 01/02/2020        _x86_64_             (4 CPU)
> >> >>
> >> >>
> >> >>
> >> >> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >> >>
> >> >>           38.11    0.00   33.09    0.11    0.61   28.08
> >> >>
> >> >>
> >> >>
> >> >> Device:            tps    MB_read/s    MB_wrtn/s    MB_read
> MB_wrtn
> >> >>
> >> >> xvda              2.36         0.01         0.01      26760
> 43360
> >> >>
> >> >> nvme0n1           0.00         0.00         0.00          2
> 0
> >> >>
> >> >> xvdf             70.95         0.06         7.67     185908
>  25205338
> >> >>
> >> >> *Top Kafka broker threads:*
> >> >> [image: image.png]
> >> >>
> >> >> *Top 3:*
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> >> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> >> >> [0x00007f8a886ce000]
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> >> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> >> >> [0x00007f8a6aefd000]
> >> >>
> >> >>
> >> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> >> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> >> >> [0x00007f8a885cd000]
> >> >>
> >> >> It doesn't looks like GC and IO is the problem.
> >> >>
> >> >> Thanks
> >> >>
> >> >
> >>
> >
>

Reply via email to