Re: High CPU Usage on Brokers

Navneeth Krishnan Sun, 12 Jan 2020 22:18:50 -0800

Hi Lisheng,

Here are the answers to your questions.


do you set sun.security.jgss.native = true? No

if not, there are some items need to be check.

1. GC, but you say gc is not problem
  - I have verified GC multiple times and I don't see that to be an issue.

2. if you suspect network thread, how many thread did you set?
 - Currently there are 3 network threads per broker and 8 io threads

3. if you enable compression
 - No, compression is not enabled

4. did you change the value of batch.size at producer side?
 - No, there hasn't been any recent changes

5. do you think you can increase "fetch.min,bytes" at consumer side and
"replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
 - Haven't tried it. If this will decrease the CPU then we can give that a
try.

6. you can check some metrics from jmx to analysis, e.g. checking
"kafka.network:type=RequestMetrics,
name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
valus is high , that means cpu will be busy.
 - I don't see RequestsPerSec metrics in 2.3. I have the
"kafka.network:type=RequestMetrics,name=TotalTimeMs"
metric.
ProducerTotalTimeMs - 1.25 ms
FetchFollowerTotalTimeMs - 2.53 ms
FetchConsumerToalTimeMs - 12.5 ms

Thanks.

On Wed, Jan 8, 2020 at 1:29 AM Lisheng Wang <wanglishen...@gmail.com> wrote:

> Hi Navneeth
>
> like the bug you said above,  do you set sun.security.jgss.native = true?
>
> if not, there are some items need to be check.
>
> 1. GC, but you say gc is not problem
> 2. if you suspect network thread, how many thread did you set?
> 3. if you enable compression
> 4. did you change the value of batch.size at producer side?
> 5. do you think you can increase "fetch.min,bytes" at consumer side and
> "replica.fetch.min.bytes" at broker to test if cpu usage can be down ?
> 6. you can check some metrics from jmx to analysis, e.g. checking
> "kafka.network:type=RequestMetrics,
> name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}", if
> valus is high , that means cpu will be busy.
>
> Best,
> Lisheng
>
>
> Navneeth Krishnan <reachnavnee...@gmail.com> 于2020年1月8日周三 下午3:39写道：
>
> > Hi All,
> >
> > Any suggestions, we are running into this issue in production and any
> > help would be greatly appreciated.
> >
> > Thanks
> >
> > On Mon, Jan 6, 2020 at 9:26 PM Navneeth Krishnan <
> reachnavnee...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Thanks for the response. We were using version 0.11 previously and all
> > our
> > > producers/consumers have been upgraded to either 1.0 or to the latest
> > 2.3.
> > >
> > > Is it normal for the network thread to consume more cpu? If you look at
> > > it, the network thread consumes 50% of the overall cpu.
> > >
> > > Regards
> > >
> > > On Mon, Jan 6, 2020 at 7:04 PM Thunder Stumpges <
> > > thunder.stump...@gmail.com> wrote:
> > >
> > >> Not sure what version your producers/consumers are, or if you upgraded
> > >> from
> > >> a previous version that used to work, or what, but maybe you're
> hitting
> > >> this?
> > >>
> > >>
> > >>
> >
> https://kafka.apache.org/23/documentation.html#upgrade_10_performance_impact
> > >>
> > >>
> > >>
> > >> On Mon, Jan 6, 2020 at 12:48 PM Navneeth Krishnan <
> > >> reachnavnee...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > Any idea on what can be done? Not sure if we are running into this
> > below
> > >> > bug.
> > >> >
> > >> > https://issues.apache.org/jira/browse/KAFKA-7925
> > >> >
> > >> > Thanks
> > >> >
> > >> > On Thu, Jan 2, 2020 at 4:18 PM Navneeth Krishnan <
> > >> reachnavnee...@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hi All,
> > >> >>
> > >> >> We have a kafka cluster with 12 nodes and we are pretty much seeing
> > 90%
> > >> >> cpu usage on all the nodes. Here is all the information. Need some
> > >> help on
> > >> >> figuring out what the problem is and how to overcome this issue.
> > >> >>
> > >> >> *Cluster:*
> > >> >> Kafka version: 2.3.0
> > >> >> Number of brokers in cluster: 12
> > >> >> Node type: 4 vCores 32GB mem
> > >> >> Network In: 10Mbps per broker
> > >> >> Network Out: 16Mbps per broker
> > >> >> Topics: 10 (approximately)
> > >> >> Partitions: 20 (Max), some has only partitions
> > >> >> Replication Factor: 3
> > >> >>
> > >> >> *CPU Usage:*
> > >> >> [image: image.png]
> > >> >>
> > >> >> *VMStat*
> > >> >>
> > >> >> [root]# vmstat 1 10
> > >> >>
> > >> >> procs -----------memory---------- ---swap-- -----io---- -system--
> > >> >> ------cpu-----
> > >> >>
> > >> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs
> us
> > sy
> > >> >> id wa st
> > >> >>
> > >> >>  8  0      0 234444  19064 24046980    0    0    17  2026    1    3
> > 38
> > >> 33
> > >> >> 28  0  1
> > >> >>
> > >> >>  7  0      0 256444  19036 24023880    0    0   768     0 64027
> 22708
> > >> 44
> > >> >> 40 16  0  1
> > >> >>
> > >> >>  7  0      0 245356  19052 24034560    0    0   256   472 63509
> 23276
> > >> 44
> > >> >> 39 17  0  1
> > >> >>
> > >> >>  7  0      0 235096  19052 24046616    0    0     0     0 62277
> 22516
> > >> 46
> > >> >> 38 15  0  1
> > >> >>
> > >> >>  8  0      0 260548  19036 24020084    0    0   516 49888 62364
> 22894
> > >> 43
> > >> >> 38 18  0  1
> > >> >>
> > >> >>  5  0      0 249232  19036 24030924    0    0   512     0 61022
> 24589
> > >> 41
> > >> >> 39 20  0  1
> > >> >>
> > >> >>  6  0      0 238072  19036 24042512    0    0  1024     0 63358
> 23063
> > >> 44
> > >> >> 38 17  0  0
> > >> >>
> > >> >>  5  0      0 262904  19052 24017972    0    0     0   440 63078
> 23499
> > >> 46
> > >> >> 37 17  0  1
> > >> >>
> > >> >>  7  0      0 250324  19052 24030008    0    0     0     0 64615
> 22617
> > >> 48
> > >> >> 38 14  0  1
> > >> >>
> > >> >>  6  0      0 237920  19052 24042372    0    0  1024 48900 63223
> 23029
> > >> 42
> > >> >> 40 18  0  1
> > >> >>
> > >> >>
> > >> >> *IO Stat:*
> > >> >>
> > >> >> [root]# iostat -m
> > >> >>
> > >> >> Linux 4.14.72-73.55.amzn2.x86_64 (
> loc-kafka11.internal.dnaspaces.io)
> > >> >> 01/02/2020        _x86_64_             (4 CPU)
> > >> >>
> > >> >>
> > >> >>
> > >> >> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > >> >>
> > >> >>           38.11    0.00   33.09    0.11    0.61   28.08
> > >> >>
> > >> >>
> > >> >>
> > >> >> Device:            tps    MB_read/s    MB_wrtn/s    MB_read
> > MB_wrtn
> > >> >>
> > >> >> xvda              2.36         0.01         0.01      26760
> > 43360
> > >> >>
> > >> >> nvme0n1           0.00         0.00         0.00          2
> > 0
> > >> >>
> > >> >> xvdf             70.95         0.06         7.67     185908
> >  25205338
> > >> >>
> > >> >> *Top Kafka broker threads:*
> > >> >> [image: image.png]
> > >> >>
> > >> >> *Top 3:*
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-0"
> > >> >> #60 prio=5 os_prio=0 tid=0x00007f8b1ab56000 nid=0x581f runnable
> > >> >> [0x00007f8a886ce000]
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-2"
> > >> >> #62 prio=5 os_prio=0 tid=0x00007f8b1ab59000 nid=0x5821 runnable
> > >> >> [0x00007f8a6aefd000]
> > >> >>
> > >> >>
> > >>
> "data-plane-kafka-network-thread-10-ListenerName(PLAINTEXT)-PLAINTEXT-1"
> > >> >> #61 prio=5 os_prio=0 tid=0x00007f8b1ab57800 nid=0x5820 runnable
> > >> >> [0x00007f8a885cd000]
> > >> >>
> > >> >> It doesn't looks like GC and IO is the problem.
> > >> >>
> > >> >> Thanks
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: High CPU Usage on Brokers

Reply via email to