Re: benchmark kafka on 10GbE network

Jay Kreps Thu, 20 Nov 2014 20:37:34 -0800

So I suspect that the bottleneck is actually in the writer thread (the one
calling send()), not the I/O thread. You could verify this by checking the
JMX stats which will give the amount of time the I/O thread spends waiting.
But since epollWait shows up first that is the I/O thread waiting for work.


It looks like the big bottleneck is computing the crc32s for the messages.
The next big hit after that is signaling the I/O thread to wake-up and do
work.

Here is an annotated version of those traces:

These two are bogus and are just background JMX things I think:
 1 39.30% 39.30%   79585 300923 java.net.SocketInputStream.socketRead0
   2 20.62% 59.92%   41750 300450 java.net.PlainSocketImpl.socketAccept

This is the I/O thread waiting for work to do
   3  9.52% 69.45%   19287 300660 sun.nio.ch.EPollArrayWrapper.epollWait

These are the real problems:
   4  9.50% 78.94%   19234 300728 org.apache.kafka.common.
record.Record.computeChecksum
   5  4.14% 83.08%    8377 300777 sun.nio.ch.EPollArrayWrapper.interrupt

I/O thread doing a write
   6  2.30% 85.38%    4662 300708 sun.nio.ch.FileDispatcherImpl.writev0

This is a one time thing when fetching metadata on startup
   7  1.61% 86.99%    3260 300752
org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata

These are all in the I/O thread so not relevant:
   8  1.24% 88.23%    2501 300804 sun.nio.ch.EPollArrayWrapper.epollWait
   9  1.08% 89.31%    2187 300734
org.apache.kafka.clients.producer.internals.RecordBatch.done
  10  0.98% 90.29%    1991 300870
org.apache.kafka.common.protocol.types.Type$6.write
  11  0.97% 91.26%    1961 300789
org.apache.kafka.clients.producer.internals.RecordAccumulator.ready
  12  0.96% 92.22%    1951 300726

-Jay

On Thu, Nov 20, 2014 at 5:42 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:

> Ok, here is the hrpof output
>
> CPU SAMPLES BEGIN (total = 202493) Fri Nov 21 08:07:51 2014
> rank   self  accum   count trace method
>    1 39.30% 39.30%   79585 300923 java.net.SocketInputStream.socketRead0
>    2 20.62% 59.92%   41750 300450 java.net.PlainSocketImpl.socketAccept
>    3  9.52% 69.45%   19287 300660 sun.nio.ch.EPollArrayWrapper.epollWait
>    4  9.50% 78.94%   19234 300728
> org.apache.kafka.common.record.Record.computeChecksum
>    5  4.14% 83.08%    8377 300777 sun.nio.ch.EPollArrayWrapper.interrupt
>    6  2.30% 85.38%    4662 300708 sun.nio.ch.FileDispatcherImpl.writev0
>    7  1.61% 86.99%    3260 300752
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>    8  1.24% 88.23%    2501 300804 sun.nio.ch.EPollArrayWrapper.epollWait
>    9  1.08% 89.31%    2187 300734
> org.apache.kafka.clients.producer.internals.RecordBatch.done
>   10  0.98% 90.29%    1991 300870
> org.apache.kafka.common.protocol.types.Type$6.write
>   11  0.97% 91.26%    1961 300789
> org.apache.kafka.clients.producer.internals.RecordAccumulator.ready
>   12  0.96% 92.22%    1951 300726
> org.apache.kafka.common.record.MemoryRecords.append
>   13  0.89% 93.12%    1809 300829 java.nio.Bits.copyFromArray
>   14  0.75% 93.86%    1510 300722 java.nio.HeapByteBuffer.<init>
>   15  0.54% 94.41%    1100 300730
> org.apache.kafka.common.record.Compressor.put
>   16  0.54% 94.95%    1094 300749
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>   17  0.38% 95.33%     771 300755
> org.apache.kafka.clients.producer.KafkaProducer.send
>   18  0.36% 95.69%     736 300830
> org.apache.kafka.common.metrics.Sensor.record
>   19  0.35% 96.04%     709 300848 sun.nio.ch.IOUtil.drain
>   20  0.33% 96.37%     665 300814 sun.nio.ch.IOUtil.drain
>   21  0.32% 96.69%     644 300812
> org.apache.kafka.common.metrics.Sensor.record
>   22  0.31% 97.00%     626 300725
> org.apache.kafka.clients.producer.internals.Partitioner.partition
>   23  0.28% 97.28%     571 300729
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>   24  0.26% 97.54%     535 300764
> org.apache.log4j.Category.getEffectiveLevel
>   25  0.25% 97.79%     501 300924
> org.apache.kafka.common.protocol.types.Schema.write
>   26  0.19% 97.98%     392 300802
> org.apache.kafka.common.metrics.Sensor.record
>   27  0.19% 98.17%     386 300797
> org.apache.kafka.common.metrics.Sensor.record
>   28  0.17% 98.34%     342 300739
> org.apache.kafka.common.record.Record.write
>   29  0.16% 98.50%     315 300792
> org.apache.kafka.common.record.Record.write
>   30  0.15% 98.64%     294 300757
> org.apache.kafka.common.record.Record.write
>   31  0.12% 98.76%     238 300731
> org.apache.kafka.common.record.Record.write
>   32  0.09% 98.85%     180 300747
> org.apache.kafka.clients.producer.KafkaProducer.send
>   33  0.09% 98.94%     177 300750
> org.apache.kafka.common.record.Record.write
>   34  0.06% 98.99%     112 300851 sun.nio.ch.NativeThread.current
>   35  0.05% 99.05%     110 300753
> org.apache.kafka.clients.producer.KafkaProducer.send
>   36  0.05% 99.09%      93 300723 java.lang.System.arraycopy
>   37  0.04% 99.13%      80 300872
> org.apache.kafka.clients.tools.ProducerPerformance.main
>   38  0.04% 99.17%      78 300770 java.util.HashMap.getEntry
>   39  0.04% 99.21%      78 300859
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append
>   40  0.04% 99.25%      73 300861 sun.nio.ch.EPollArrayWrapper.epollCtl
>   41  0.03% 99.28%      67 300718 sun.misc.Unsafe.copyMemory
>   42  0.03% 99.31%      59 300737
> org.apache.kafka.clients.producer.internals.Metadata.timeToNextUpdate
>   43  0.03% 99.33%      52 300816
> org.apache.kafka.clients.producer.internals.Metadata.fetch
>   44  0.02% 99.36%      48 300715 sun.nio.ch.FileDispatcherImpl.read0
>   45  0.02% 99.38%      42 300794
> org.apache.log4j.Category.getEffectiveLevel
>   46  0.02% 99.40%      41 300740
> org.apache.kafka.clients.producer.internals.Metadata.timeToNextUpdate
>   47  0.02% 99.42%      40 300795 sun.nio.ch.NativeThread.current
>   48  0.01% 99.43%      28 300785 sun.nio.ch.EPollArrayWrapper.epollCtl
>   49  0.01% 99.44%      25 301055 sun.nio.ch.EPollSelectorImpl.wakeup
>   50  0.01% 99.45%      22 300806 java.lang.Thread.currentThread
> CPU SAMPLES END
>
>
> On Fri, Nov 21, 2014 at 5:05 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > Great. There is a single I/O thread per producer client that does the
> > sending so it could be either that the sender thread or that thread is
> just
> > pegged. One way to dive in and see what is happening is to add the
> command
> > line option
> > *  -agentlib:hprof=cpu=samples,depth=10*
> > This will tell us where the time is going. If you sent that around it may
> > be informative.
> >
> > -Jay
> >
> >
> > On Thu, Nov 20, 2014 at 12:41 AM, Manu Zhang <owenzhang1...@gmail.com>
> > wrote:
> >
> > > Thanks Jay. The producer metrics from jconsole is quite helpful.
> > >
> > > I've switched to the new producer and run producer benchmark with
> > >
> > > */usr/lib/kafka/bin/kafka-run-class.sh
> > > org.apache.kafka.clients.tools.ProducerPerformance topic1 500000000
> 1000
> > -1
> > > acks=1 bootstrap.servers=node1:9092,node2:9092,node3:9092,node4:9092
> > > buffer.memory=2097152000 batch.size=1000000 linger.ms
> > > <http://linger.ms>=100*
> > >
> > > so my message size is 1000 bytes (gave up on 100 bytes after fruitless
> > > experiments) and I've deliberately batched outgoing messages with the "
> > > linger.ms" conf. A single producer could send 300 MB/s on average and
> 3
> > > producers almost saturated the network bandwidth. CPU is fully utilized
> > for
> > > each producer thread. It seems that I can't go further in a single
> > > producer. Any thoughts ?
> > >
> > > Also, I've noticed this kafka-fast
> > > <https://github.com/gerritjvv/kafka-fast> project,
> > > who claimed producer throughput could reach 191975 K messages/s for 1KB
> > > message on 10GbE network. The difference is that a producer is created
> > per
> > > topic partition.
> > >
> > >
> > > On Wed, Nov 19, 2014 at 12:34 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> > >
> > > > Yeah this will involve some experimentation.
> > > >
> > > > The metrics are visible with jconsole or another jmx viewer.
> > > >
> > > > It may also be worth looking at the cpu usage per-thread (e.g. start
> > top
> > > > and press 't' I think).
> > > >
> > > > Another simple test for broker vs client as the bottleneck is just to
> > > start
> > > > another producer or consumer and see if that improves throughput (if
> so
> > > it
> > > > is probably a client bottleneck).
> > > >
> > > > -Jay
> > > >
> > > > On Tue, Nov 18, 2014 at 4:44 PM, Manu Zhang <owenzhang1...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Thanks Jay for the quick response.
> > > > >
> > > > > Yes, it's a single producer and consumer both configured with
> > multiple
> > > > > threads but I'm not using the new producer.
> > > > > CPU is typically 50% utilized on client and merely used on broker.
> > > Disks
> > > > > aren't busy either as a lot of data are cached in memory.
> > > > > Would you please give a link for the producer metrics you are
> > referring
> > > > to
> > > > > ?
> > > > >
> > > > > Thanks,
> > > > > Manu
> > > > >
> > > > > On Wed, Nov 19, 2014 at 2:39 AM, Jay Kreps <jay.kr...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hey Manu,
> > > > > >
> > > > > > I'm not aware of a benchmark on 10GbE. I'd love to see that
> though.
> > > > > Diving
> > > > > > into the results may help us find bottlenecks hidden by the
> slower
> > > > > network.
> > > > > >
> > > > > > Can you figure out where the bottleneck is in your test? I assume
> > > this
> > > > > is a
> > > > > > single producer and consumer instance and you are using the new
> > > > producer
> > > > > as
> > > > > > in those benchmarks?
> > > > > >
> > > > > > This can be slightly tricky as it can be cpu or I/O on either the
> > > > clients
> > > > > > or the brokers. You basically have to look at top, iostat, and
> the
> > > jmx
> > > > > > metrics for clues. The producer has good metrics that explain
> > whether
> > > > it
> > > > > is
> > > > > > spending most of its time waiting or sending data. Not sure if
> > there
> > > > is a
> > > > > > similar diagnostic for the consumer.
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > > > On Tue, Nov 18, 2014 at 5:10 AM, Manu Zhang <
> > owenzhang1...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I have been trying out kafka benchmarks described in Jay's
> > > > > > >
> > > >
> benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machine
> > > > > > > >.
> > > > > > > I'm able to get similar results on a 4-node GbE network whose
> > > > in-bytes
> > > > > > > could be saturated at 120MB/s. However, on a 4-node, 10GbE
> > > network, I
> > > > > can
> > > > > > > not get in-bytes higher than 150MB/s. *Has anyone benchmarked
> > kafka
> > > > on
> > > > > a
> > > > > > > 10GbE network ? Any rule of thumb on 10GbE network for
> > > configurations
> > > > > of
> > > > > > > broker, producer and consumer ? *
> > > > > > >
> > > > > > > My kafka version is 0.8.1.1 and I've created a topic with 8
> > > > partitions
> > > > > > with
> > > > > > > 1 replica distributed evenly among the 4 nodes. Message size is
> > 100
> > > > > > bytes.
> > > > > > > I use all the default kafka settings.
> > > > > > > My cluster has 4 nodes, where each node has 32 cores, 128MB RAM
> > > and 3
> > > > > > disks
> > > > > > > for kafka.
> > > > > > >
> > > > > > > I've tried increasing message size to 1000 bytes which improved
> > > > > > producer's
> > > > > > > throughput but not consumer's.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Manu
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: benchmark kafka on 10GbE network

Reply via email to