Thanks. I managed to get a cpu dump from staging.
The output:
THREAD START (obj=50000427, id = 200004, name="RMI TCP Accept-0",
group="system")
THREAD START (obj=50000427, id = 200001, name="main", group="main")
THREAD START (obj=50000427, id = 200005, name="SensorExpiryThread",
group="main")
THREAD START (obj=500008e6, id = 200006,
name="ThrottledRequestReaper-Fetch", group="main")
THREAD START (obj=500008e6, id = 200007,
name="ThrottledRequestReaper-Produce", group="main")
THREAD START (obj=50000914, id = 200008,
name="ZkClient-EventThread-18-zookeeper:2181", group="main")
THREAD START (obj=500008e6, id = 200009, name="main-SendThread()",
group="main")
THREAD START (obj=50000950, id = 200010, name="main-EventThread",
group="main")
THREAD START (obj=50000427, id = 200011, name="pool-3-thread-1",
group="main")
THREAD END (id = 200011)
THREAD START (obj=50000427, id = 200012,
name="metrics-meter-tick-thread-1", group="main")
THREAD START (obj=50000427, id = 200014, name="kafka-scheduler-0",
group="main")
THREAD START (obj=50000427, id = 200013, name="kafka-scheduler-1",
group="main")
THREAD START (obj=50000427, id = 200015, name="kafka-scheduler-2",
group="main")
THREAD START (obj=50000c33, id = 200016, name="kafka-log-cleaner-thread-0",
group="main")
THREAD START (obj=50000427, id = 200017,
name="kafka-network-thread-2-PLAINTEXT-0", group="main")
THREAD START (obj=50000427, id = 200018,
name="kafka-network-thread-2-PLAINTEXT-1", group="main")
THREAD START (obj=50000427, id = 200019,
name="kafka-network-thread-2-PLAINTEXT-2", group="main")
THREAD START (obj=50000427, id = 200020,
name="kafka-socket-acceptor-PLAINTEXT-9092", group="main")
THREAD START (obj=500008e6, id = 200021, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200022, name="ExpirationReaper-2",
group="main")
THREAD START (obj=50000427, id = 200023,
name="metrics-meter-tick-thread-2", group="main")
THREAD START (obj=50000427, id = 200024, name="kafka-scheduler-3",
group="main")
THREAD START (obj=50000427, id = 200025, name="kafka-scheduler-4",
group="main")
THREAD START (obj=50000427, id = 200026, name="kafka-scheduler-5",
group="main")
THREAD START (obj=50000427, id = 200027, name="kafka-scheduler-6",
group="main")
THREAD START (obj=500008e6, id = 200028, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200029, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200030, name="ExpirationReaper-2",
group="main")
THREAD START (obj=50000427, id = 200031, name="group-metadata-manager-0",
group="main")
THREAD START (obj=50000427, id = 200032, name="kafka-request-handler-0",
group="main")
THREAD START (obj=50000427, id = 200037, name="kafka-request-handler-5",
group="main")
THREAD START (obj=50000427, id = 200036, name="kafka-request-handler-4",
group="main")
THREAD START (obj=50000427, id = 200035, name="kafka-request-handler-3",
group="main")
THREAD START (obj=50000427, id = 200034, name="kafka-request-handler-2",
group="main")
THREAD START (obj=50000427, id = 200033, name="kafka-request-handler-1",
group="main")
THREAD START (obj=50000427, id = 200038, name="kafka-request-handler-6",
group="main")
THREAD START (obj=50000427, id = 200039, name="kafka-request-handler-7",
group="main")
THREAD START (obj=50000427, id = 200040, name="kafka-scheduler-7",
group="main")
THREAD START (obj=50000427, id = 200041, name="kafka-scheduler-8",
group="main")
THREAD START (obj=50000ee2, id = 200042, name="ReplicaFetcherThread-0-0",
group="main")
THREAD START (obj=50000ee2, id = 200043, name="ReplicaFetcherThread-0-1",
group="main")
THREAD START (obj=50000427, id = 200044, name="kafka-scheduler-9",
group="main")
THREAD START (obj=50000427, id = 200045, name="executor-Fetch",
group="main")
TRACE 300920:
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
TRACE 300518:
java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
java.net.ServerSocket.implAccept(ServerSocket.java:545)
java.net.ServerSocket.accept(ServerSocket.java:513)
TRACE 300940:
sun.nio.ch.FileDispatcherImpl.write0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
sun.nio.ch.IOUtil.write(IOUtil.java:65)
TRACE 301003:
org.xerial.snappy.SnappyNative.rawUncompress(SnappyNative.java:Unknown line)
org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474)
org.xerial.snappy.Snappy.uncompress(Snappy.java:513)
org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:147)
TRACE 300979:
sun.nio.ch.FileDispatcherImpl.pread0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
TRACE 301630:
sun.nio.ch.EPollArrayWrapper.epollCtl(EPollArrayWrapper.java:Unknown line)
sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:299)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
TRACE 301259:
sun.misc.Unsafe.unpark(Unsafe.java:Unknown line)
java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1264)
TRACE 301559:
sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
TRACE 300590:
java.lang.ClassLoader.defineClass1(ClassLoader.java:Unknown line)
java.lang.ClassLoader.defineClass(ClassLoader.java:763)
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
TRACE 301643:
scala.Tuple2.equals(Tuple2.scala:20)
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940)
kafka.utils.Pool.get(Pool.scala:69)
kafka.server.ReplicaManager.getPartition(ReplicaManager.scala:280)
TRACE 300592:
java.util.zip.ZipFile.read(ZipFile.java:Unknown line)
java.util.zip.ZipFile.access$1400(ZipFile.java:60)
java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
TRACE 301018:
kafka.utils.CoreUtils$.crc32(CoreUtils.scala:148)
kafka.message.Message.computeChecksum(Message.scala:216)
kafka.message.Message.isValid(Message.scala:226)
kafka.message.Message.ensureValid(Message.scala:232)
TRACE 301561:
java.io.FileDescriptor.sync(FileDescriptor.java:Unknown line)
kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:62)
kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:49)
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:945)
TRACE 301422:
java.util.Arrays.copyOf(Arrays.java:3332)
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
CPU SAMPLES BEGIN (total = 699214) Thu Mar 23 12:41:17 2017
rank self accum count trace method
1 86.46% 86.46% 604544 300920 sun.nio.ch.EPollArrayWrapper.epollWait
2 12.62% 99.08% 88254 300518 java.net.PlainSocketImpl.socketAccept
3 0.11% 99.19% 759 300940 sun.nio.ch.FileDispatcherImpl.write0
4 0.04% 99.23% 253 301003
org.xerial.snappy.SnappyNative.rawUncompress
5 0.03% 99.26% 231 300979 sun.nio.ch.FileDispatcherImpl.pread0
6 0.03% 99.29% 220 301630 sun.nio.ch.EPollArrayWrapper.epollCtl
7 0.03% 99.32% 219 301259 sun.misc.Unsafe.unpark
8 0.02% 99.34% 145 301559 sun.nio.ch.FileDispatcherImpl.read0
9 0.01% 99.36% 89 300590 java.lang.ClassLoader.defineClass1
10 0.01% 99.37% 87 301643 scala.Tuple2.equals
11 0.01% 99.38% 79 300592 java.util.zip.ZipFile.read
12 0.01% 99.39% 79 301018 kafka.utils.CoreUtils$.crc32
13 0.01% 99.40% 78 301561 java.io.FileDescriptor.sync
14 0.01% 99.41% 72 301422 java.util.Arrays.copyOf
CPU SAMPLES END
It seems like the constant disconnects is far bigger then the 10 minutes
default. I suspect this has something to do with double connects, which I'm
not sure to get around.
On Thu, Mar 23, 2017 at 11:46 AM, Manikumar <manikumar.re...@gmail.com>
wrote:
1. may be you can monitor thread wise cpu usage and correlate with thread
dump
to identify the bottleneck
2. Broker config property connections.max.idle.ms is used to close
idle connections.
default is 10min.
On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <p...@sportr.co.uk>
wrote:
Hi,
I deployed Kafka about a week ago, but there are a few problems with how
Kafka behaves.
The first is the surprisingly high resource usage, one this the memory
(1.5-2 GB for each broker, 3 brokers) although this might be normal. The
other one is the cpu usage, which starts with 20% minimum on each broker,
which I find strange with the current throughput (which is < 1 msg/s).
This might has something to do with something else which I find strange,
Kafka disconnects clients about every 10-20 minutes per broker. It might
have something to do with the configuration: Deployed in kubernetes,
bootstrapping with a single dns name (which is backed by all alive kafka
brokers), and then every broker has a separate dns address which is used
after the bootstrap. This means that a client is connected twice to one
of
the brokers. The reason for the bootstrap dns name is to make sure I
don't
have to update all clients to include other brokers.
Any advice on how to solve these 2 problems?
Thanks,
Paul
On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <p...@sportr.co.uk
wrote:
Hi,
I deployed Kafka about a week ago, but there are a few problems with
how
Kafka behaves.
The first is the surprisingly high resource usage, one this the memory
(1.5-2 GB for each broker, 3 brokers) although this might be normal.
The
other one is the cpu usage, which starts with 20% minimum on each
broker,
which I find strange with the current throughput (which is < 1 msg/s).
This might has something to do with something else which I find
strange,
Kafka disconnects clients about every 10-20 minutes per broker. It
might
have something to do with the configuration: Deployed in kubernetes,
bootstrapping with a single dns name (which is backed by all alive
kafka
brokers), and then every broker has a separate dns address which is
used
after the bootstrap. This means that a client is connected twice to one
of
the brokers. The reason for the bootstrap dns name is to make sure I
don't
have to update all clients to include other brokers.
Any advice on how to solve these 2 problems?
Thanks,
Paul