thanks, the new kafka cluster is stable now, I want to know the root cause.

On Sat, Mar 19, 2016 at 3:34 PM, Stevo Slavić <ssla...@gmail.com> wrote:

> Consider upgrading to 0.9.0.1 - our cluster is much more stable since
> upgrade. Clients can be upgraded later.
>
> On Fri, Mar 18, 2016, 23:22 li jinyu <disorder...@gmail.com> wrote:
>
> > I want to add more information.
> >
> > we've already setup a completely new Kafka cluster with 0.8.2.2, and the
> > old cluster is still there, but no consumer or producer, it's really
> wired
> > that the CPU usage is still high.
> >
> > also changed fetch.purgatory.purge.interval.requests and
> > producer.purgatory.purge.interval.requests
> > to 100, as I saw the high CPU usage with FetchRequestPurgatory.expire(),
> > but no luck.
> >
> >
> > On Fri, Mar 18, 2016 at 7:07 PM, li jinyu <disorder...@gmail.com> wrote:
> >
> > > hi all,
> > >
> > > we encountered problem with 0.8.1.1, as a lot of BadVersion error in
> log,
> > > we believe it's caused by issue
> > > https://issues.apache.org/jira/browse/KAFKA-1382, so we upgraded kafka
> > to
> > > 0.8.2.2.
> > >
> > > but we found that kafka is using high CPU. following is the result of
> > > 'top' command:
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > >
> > > 16781 root      20   0 9072m 874m  17m S 975.5  5.5 471:27.56 java
> > >
> > > I tried to use HRPOF to measure the CPU usage, got the result:
> > > CPU SAMPLES BEGIN (total = 139166340) Fri Mar 18 16:30:28 2016 rank
> self
> > > accum count trace method 1 37.63% 37.63% 52372145 325245
> > > sun.nio.ch.EPollArrayWrapper.epollWait 2 33.37% 71.00% 46441166 319805
> > > sun.nio.ch.EPollArrayWrapper.epollWait 3 8.49% 79.50% 11819885 302339
> > > java.net.PlainSocketImpl.socketAccept 4 4.25% 83.74% 5909221 302910
> > > java.net.PlainSocketImpl.socketAccept 5 4.23% 87.98% 5891650 310256
> > > sun.nio.ch.EPollArrayWrapper.epollWait 6 4.21% 92.19% 5859357 319905
> > > sun.nio.ch.EPollArrayWrapper.epollWait 7 4.16% 96.34% 5782672 336990
> > > java.net.SocketInputStream.socketRead0 8 1.99% 98.33% 2762683 320651
> > > sun.misc.Unsafe.park 9 0.27% 98.60% 375233 319447 sun.misc.Unsafe.park
> 10
> > > 0.25% 98.85% 351701 320536 sun.misc.Unsafe.park 11 0.24% 99.09% 329232
> > > 329274 sun.misc.Unsafe.park 12 0.08% 99.17% 109428 325405
> > > sun.nio.ch.EPollArrayWrapper.epollCtl 13 0.05% 99.22% 69762 329549
> > > sun.nio.ch.EPollArrayWrapper.interrupt 14 0.03% 99.25% 45197 328439
> > > kafka.network.RequestChannel$Request.updateRequestMetrics 15 0.03%
> 99.28%
> > > 38496 328333 sun.nio.ch.FileDispatcherImpl.write0 16 0.02% 99.30% 33307
> > > 337909 java.lang.System.identityHashCode 17 0.02% 99.32% 31783 328763
> > > sun.nio.ch.NativeThread.current 18 0.02% 99.34% 30917 322537
> > > sun.misc.Unsafe.unpark 19 0.02% 99.37% 29943 325979
> > > sun.nio.ch.EPollArrayWrapper.epollWait 20 0.02% 99.39% 27788 328474
> > > sun.nio.ch.FileDispatcherImpl.write0 21 0.02% 99.40% 26448 328397
> > > sun.nio.ch.FileDispatcherImpl.write0 22 0.02% 99.42% 25177 329364
> > > scala.collection.mutable.HashTable$class.$init$ 23 0.02% 99.44% 24230
> > > 328786 java.lang.Thread.currentThread 24 0.02% 99.46% 22683 328795
> > > java.lang.Thread.currentThread 25 0.01% 99.47% 16264 328092
> > > sun.nio.ch.IOUtil.drain
> > >
> > > as most of time, epollwait shouldn't be so expensive, it should be the
> > > problem of way of statistic.
> > > then I tried YourKit, found that most heavy method is in
> > > FetchRequestPurgatory.expire(), I uploaded the screenshot of YourKit:
> > > http://imgur.com/2tXjhBI
> > >
> > > does anyone have same problem?
> > >
> > > --
> > > Don't schedule every day, make them disorder.
> > >
> >
> >
> >
> > --
> > Don't schedule every day, make them disorder.
> >
>



-- 
Don't schedule every day, make them disorder.

Reply via email to