So I suspect that the bottleneck is actually in the writer thread (the one calling send()), not the I/O thread. You could verify this by checking the JMX stats which will give the amount of time the I/O thread spends waiting. But since epollWait shows up first that is the I/O thread waiting for work.
It looks like the big bottleneck is computing the crc32s for the messages. The next big hit after that is signaling the I/O thread to wake-up and do work. Here is an annotated version of those traces: These two are bogus and are just background JMX things I think: 1 39.30% 39.30% 79585 300923 java.net.SocketInputStream.socketRead0 2 20.62% 59.92% 41750 300450 java.net.PlainSocketImpl.socketAccept This is the I/O thread waiting for work to do 3 9.52% 69.45% 19287 300660 sun.nio.ch.EPollArrayWrapper.epollWait These are the real problems: 4 9.50% 78.94% 19234 300728 org.apache.kafka.common. record.Record.computeChecksum 5 4.14% 83.08% 8377 300777 sun.nio.ch.EPollArrayWrapper.interrupt I/O thread doing a write 6 2.30% 85.38% 4662 300708 sun.nio.ch.FileDispatcherImpl.writev0 This is a one time thing when fetching metadata on startup 7 1.61% 86.99% 3260 300752 org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata These are all in the I/O thread so not relevant: 8 1.24% 88.23% 2501 300804 sun.nio.ch.EPollArrayWrapper.epollWait 9 1.08% 89.31% 2187 300734 org.apache.kafka.clients.producer.internals.RecordBatch.done 10 0.98% 90.29% 1991 300870 org.apache.kafka.common.protocol.types.Type$6.write 11 0.97% 91.26% 1961 300789 org.apache.kafka.clients.producer.internals.RecordAccumulator.ready 12 0.96% 92.22% 1951 300726 -Jay On Thu, Nov 20, 2014 at 5:42 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: > Ok, here is the hrpof output > > CPU SAMPLES BEGIN (total = 202493) Fri Nov 21 08:07:51 2014 > rank self accum count trace method > 1 39.30% 39.30% 79585 300923 java.net.SocketInputStream.socketRead0 > 2 20.62% 59.92% 41750 300450 java.net.PlainSocketImpl.socketAccept > 3 9.52% 69.45% 19287 300660 sun.nio.ch.EPollArrayWrapper.epollWait > 4 9.50% 78.94% 19234 300728 > org.apache.kafka.common.record.Record.computeChecksum > 5 4.14% 83.08% 8377 300777 sun.nio.ch.EPollArrayWrapper.interrupt > 6 2.30% 85.38% 4662 300708 sun.nio.ch.FileDispatcherImpl.writev0 > 7 1.61% 86.99% 3260 300752 > org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata > 8 1.24% 88.23% 2501 300804 sun.nio.ch.EPollArrayWrapper.epollWait > 9 1.08% 89.31% 2187 300734 > org.apache.kafka.clients.producer.internals.RecordBatch.done > 10 0.98% 90.29% 1991 300870 > org.apache.kafka.common.protocol.types.Type$6.write > 11 0.97% 91.26% 1961 300789 > org.apache.kafka.clients.producer.internals.RecordAccumulator.ready > 12 0.96% 92.22% 1951 300726 > org.apache.kafka.common.record.MemoryRecords.append > 13 0.89% 93.12% 1809 300829 java.nio.Bits.copyFromArray > 14 0.75% 93.86% 1510 300722 java.nio.HeapByteBuffer.<init> > 15 0.54% 94.41% 1100 300730 > org.apache.kafka.common.record.Compressor.put > 16 0.54% 94.95% 1094 300749 > org.apache.kafka.clients.producer.internals.RecordAccumulator.append > 17 0.38% 95.33% 771 300755 > org.apache.kafka.clients.producer.KafkaProducer.send > 18 0.36% 95.69% 736 300830 > org.apache.kafka.common.metrics.Sensor.record > 19 0.35% 96.04% 709 300848 sun.nio.ch.IOUtil.drain > 20 0.33% 96.37% 665 300814 sun.nio.ch.IOUtil.drain > 21 0.32% 96.69% 644 300812 > org.apache.kafka.common.metrics.Sensor.record > 22 0.31% 97.00% 626 300725 > org.apache.kafka.clients.producer.internals.Partitioner.partition > 23 0.28% 97.28% 571 300729 > org.apache.kafka.clients.producer.internals.RecordAccumulator.append > 24 0.26% 97.54% 535 300764 > org.apache.log4j.Category.getEffectiveLevel > 25 0.25% 97.79% 501 300924 > org.apache.kafka.common.protocol.types.Schema.write > 26 0.19% 97.98% 392 300802 > org.apache.kafka.common.metrics.Sensor.record > 27 0.19% 98.17% 386 300797 > org.apache.kafka.common.metrics.Sensor.record > 28 0.17% 98.34% 342 300739 > org.apache.kafka.common.record.Record.write > 29 0.16% 98.50% 315 300792 > org.apache.kafka.common.record.Record.write > 30 0.15% 98.64% 294 300757 > org.apache.kafka.common.record.Record.write > 31 0.12% 98.76% 238 300731 > org.apache.kafka.common.record.Record.write > 32 0.09% 98.85% 180 300747 > org.apache.kafka.clients.producer.KafkaProducer.send > 33 0.09% 98.94% 177 300750 > org.apache.kafka.common.record.Record.write > 34 0.06% 98.99% 112 300851 sun.nio.ch.NativeThread.current > 35 0.05% 99.05% 110 300753 > org.apache.kafka.clients.producer.KafkaProducer.send > 36 0.05% 99.09% 93 300723 java.lang.System.arraycopy > 37 0.04% 99.13% 80 300872 > org.apache.kafka.clients.tools.ProducerPerformance.main > 38 0.04% 99.17% 78 300770 java.util.HashMap.getEntry > 39 0.04% 99.21% 78 300859 > org.apache.kafka.clients.producer.internals.RecordAccumulator.append > 40 0.04% 99.25% 73 300861 sun.nio.ch.EPollArrayWrapper.epollCtl > 41 0.03% 99.28% 67 300718 sun.misc.Unsafe.copyMemory > 42 0.03% 99.31% 59 300737 > org.apache.kafka.clients.producer.internals.Metadata.timeToNextUpdate > 43 0.03% 99.33% 52 300816 > org.apache.kafka.clients.producer.internals.Metadata.fetch > 44 0.02% 99.36% 48 300715 sun.nio.ch.FileDispatcherImpl.read0 > 45 0.02% 99.38% 42 300794 > org.apache.log4j.Category.getEffectiveLevel > 46 0.02% 99.40% 41 300740 > org.apache.kafka.clients.producer.internals.Metadata.timeToNextUpdate > 47 0.02% 99.42% 40 300795 sun.nio.ch.NativeThread.current > 48 0.01% 99.43% 28 300785 sun.nio.ch.EPollArrayWrapper.epollCtl > 49 0.01% 99.44% 25 301055 sun.nio.ch.EPollSelectorImpl.wakeup > 50 0.01% 99.45% 22 300806 java.lang.Thread.currentThread > CPU SAMPLES END > > > On Fri, Nov 21, 2014 at 5:05 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > Great. There is a single I/O thread per producer client that does the > > sending so it could be either that the sender thread or that thread is > just > > pegged. One way to dive in and see what is happening is to add the > command > > line option > > * -agentlib:hprof=cpu=samples,depth=10* > > This will tell us where the time is going. If you sent that around it may > > be informative. > > > > -Jay > > > > > > On Thu, Nov 20, 2014 at 12:41 AM, Manu Zhang <owenzhang1...@gmail.com> > > wrote: > > > > > Thanks Jay. The producer metrics from jconsole is quite helpful. > > > > > > I've switched to the new producer and run producer benchmark with > > > > > > */usr/lib/kafka/bin/kafka-run-class.sh > > > org.apache.kafka.clients.tools.ProducerPerformance topic1 500000000 > 1000 > > -1 > > > acks=1 bootstrap.servers=node1:9092,node2:9092,node3:9092,node4:9092 > > > buffer.memory=2097152000 batch.size=1000000 linger.ms > > > <http://linger.ms>=100* > > > > > > so my message size is 1000 bytes (gave up on 100 bytes after fruitless > > > experiments) and I've deliberately batched outgoing messages with the " > > > linger.ms" conf. A single producer could send 300 MB/s on average and > 3 > > > producers almost saturated the network bandwidth. CPU is fully utilized > > for > > > each producer thread. It seems that I can't go further in a single > > > producer. Any thoughts ? > > > > > > Also, I've noticed this kafka-fast > > > <https://github.com/gerritjvv/kafka-fast> project, > > > who claimed producer throughput could reach 191975 K messages/s for 1KB > > > message on 10GbE network. The difference is that a producer is created > > per > > > topic partition. > > > > > > > > > On Wed, Nov 19, 2014 at 12:34 PM, Jay Kreps <jay.kr...@gmail.com> > wrote: > > > > > > > Yeah this will involve some experimentation. > > > > > > > > The metrics are visible with jconsole or another jmx viewer. > > > > > > > > It may also be worth looking at the cpu usage per-thread (e.g. start > > top > > > > and press 't' I think). > > > > > > > > Another simple test for broker vs client as the bottleneck is just to > > > start > > > > another producer or consumer and see if that improves throughput (if > so > > > it > > > > is probably a client bottleneck). > > > > > > > > -Jay > > > > > > > > On Tue, Nov 18, 2014 at 4:44 PM, Manu Zhang <owenzhang1...@gmail.com > > > > > > wrote: > > > > > > > > > Thanks Jay for the quick response. > > > > > > > > > > Yes, it's a single producer and consumer both configured with > > multiple > > > > > threads but I'm not using the new producer. > > > > > CPU is typically 50% utilized on client and merely used on broker. > > > Disks > > > > > aren't busy either as a lot of data are cached in memory. > > > > > Would you please give a link for the producer metrics you are > > referring > > > > to > > > > > ? > > > > > > > > > > Thanks, > > > > > Manu > > > > > > > > > > On Wed, Nov 19, 2014 at 2:39 AM, Jay Kreps <jay.kr...@gmail.com> > > > wrote: > > > > > > > > > > > Hey Manu, > > > > > > > > > > > > I'm not aware of a benchmark on 10GbE. I'd love to see that > though. > > > > > Diving > > > > > > into the results may help us find bottlenecks hidden by the > slower > > > > > network. > > > > > > > > > > > > Can you figure out where the bottleneck is in your test? I assume > > > this > > > > > is a > > > > > > single producer and consumer instance and you are using the new > > > > producer > > > > > as > > > > > > in those benchmarks? > > > > > > > > > > > > This can be slightly tricky as it can be cpu or I/O on either the > > > > clients > > > > > > or the brokers. You basically have to look at top, iostat, and > the > > > jmx > > > > > > metrics for clues. The producer has good metrics that explain > > whether > > > > it > > > > > is > > > > > > spending most of its time waiting or sending data. Not sure if > > there > > > > is a > > > > > > similar diagnostic for the consumer. > > > > > > > > > > > > -Jay > > > > > > > > > > > > On Tue, Nov 18, 2014 at 5:10 AM, Manu Zhang < > > owenzhang1...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I have been trying out kafka benchmarks described in Jay's > > > > > > > > > > > > benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machine > > > > > > > >. > > > > > > > I'm able to get similar results on a 4-node GbE network whose > > > > in-bytes > > > > > > > could be saturated at 120MB/s. However, on a 4-node, 10GbE > > > network, I > > > > > can > > > > > > > not get in-bytes higher than 150MB/s. *Has anyone benchmarked > > kafka > > > > on > > > > > a > > > > > > > 10GbE network ? Any rule of thumb on 10GbE network for > > > configurations > > > > > of > > > > > > > broker, producer and consumer ? * > > > > > > > > > > > > > > My kafka version is 0.8.1.1 and I've created a topic with 8 > > > > partitions > > > > > > with > > > > > > > 1 replica distributed evenly among the 4 nodes. Message size is > > 100 > > > > > > bytes. > > > > > > > I use all the default kafka settings. > > > > > > > My cluster has 4 nodes, where each node has 32 cores, 128MB RAM > > > and 3 > > > > > > disks > > > > > > > for kafka. > > > > > > > > > > > > > > I've tried increasing message size to 1000 bytes which improved > > > > > > producer's > > > > > > > throughput but not consumer's. > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > Manu > > > > > > > > > > > > > > > > > > > > > > > > > > > >