Thanks for the explanation.

Here are some stats for I/O thread.

*io-ratio 0.155*
*io-time-ns-avg 16418.5*
*io-wait-ratio 0.59*
*io-wait-time-ns-avg  62881*

It seems to confirm that IO spent much more time waiting than doing real
work.

Given the above stats, how could I trace down and pinpoint the bottleneck ?
I guess computing crc32s can not be avoided.

On Fri, Nov 21, 2014 at 12:34 PM, Jay Kreps <jay.kr...@gmail.com> wrote:

> So I suspect that the bottleneck is actually in the writer thread (the one
> calling send()), not the I/O thread. You could verify this by checking the
> JMX stats which will give the amount of time the I/O thread spends waiting.
> But since epollWait shows up first that is the I/O thread waiting for work.
>
> It looks like the big bottleneck is computing the crc32s for the messages.
> The next big hit after that is signaling the I/O thread to wake-up and do
> work.
>
> Here is an annotated version of those traces:
>
> These two are bogus and are just background JMX things I think:
>  1 39.30% 39.30%   79585 300923 java.net.SocketInputStream.socketRead0
>    2 20.62% 59.92%   41750 300450 java.net.PlainSocketImpl.socketAccept
>
> This is the I/O thread waiting for work to do
>    3  9.52% 69.45%   19287 300660 sun.nio.ch.EPollArrayWrapper.epollWait
>
> These are the real problems:
>    4  9.50% 78.94%   19234 300728 org.apache.kafka.common.
> record.Record.computeChecksum
>    5  4.14% 83.08%    8377 300777 sun.nio.ch.EPollArrayWrapper.interrupt
>
> I/O thread doing a write
>    6  2.30% 85.38%    4662 300708 sun.nio.ch.FileDispatcherImpl.writev0
>
> This is a one time thing when fetching metadata on startup
>    7  1.61% 86.99%    3260 300752
> org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
>
> These are all in the I/O thread so not relevant:
>    8  1.24% 88.23%    2501 300804 sun.nio.ch.EPollArrayWrapper.epollWait
>    9  1.08% 89.31%    2187 300734
> org.apache.kafka.clients.producer.internals.RecordBatch.done
>   10  0.98% 90.29%    1991 300870
> org.apache.kafka.common.protocol.types.Type$6.write
>   11  0.97% 91.26%    1961 300789
> org.apache.kafka.clients.producer.internals.RecordAccumulator.ready
>   12  0.96% 92.22%    1951 300726
>
> -Jay
>
> On Thu, Nov 20, 2014 at 5:42 PM, Manu Zhang <owenzhang1...@gmail.com>
> wrote:
>
> > Ok, here is the hrpof output
> >
> > CPU SAMPLES BEGIN (total = 202493) Fri Nov 21 08:07:51 2014
> > rank   self  accum   count trace method
> >    1 39.30% 39.30%   79585 300923 java.net.SocketInputStream.socketRead0
> >    2 20.62% 59.92%   41750 300450 java.net.PlainSocketImpl.socketAccept
> >    3  9.52% 69.45%   19287 300660 sun.nio.ch.EPollArrayWrapper.epollWait
> >    4  9.50% 78.94%   19234 300728
> > org.apache.kafka.common.record.Record.computeChecksum
> >    5  4.14% 83.08%    8377 300777 sun.nio.ch.EPollArrayWrapper.interrupt
> >    6  2.30% 85.38%    4662 300708 sun.nio.ch.FileDispatcherImpl.writev0
> >    7  1.61% 86.99%    3260 300752
> > org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata
> >    8  1.24% 88.23%    2501 300804 sun.nio.ch.EPollArrayWrapper.epollWait
> >    9  1.08% 89.31%    2187 300734
> > org.apache.kafka.clients.producer.internals.RecordBatch.done
> >   10  0.98% 90.29%    1991 300870
> > org.apache.kafka.common.protocol.types.Type$6.write
> >   11  0.97% 91.26%    1961 300789
> > org.apache.kafka.clients.producer.internals.RecordAccumulator.ready
> >   12  0.96% 92.22%    1951 300726
> > org.apache.kafka.common.record.MemoryRecords.append
> >   13  0.89% 93.12%    1809 300829 java.nio.Bits.copyFromArray
> >   14  0.75% 93.86%    1510 300722 java.nio.HeapByteBuffer.<init>
> >   15  0.54% 94.41%    1100 300730
> > org.apache.kafka.common.record.Compressor.put
> >   16  0.54% 94.95%    1094 300749
> > org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> >   17  0.38% 95.33%     771 300755
> > org.apache.kafka.clients.producer.KafkaProducer.send
> >   18  0.36% 95.69%     736 300830
> > org.apache.kafka.common.metrics.Sensor.record
> >   19  0.35% 96.04%     709 300848 sun.nio.ch.IOUtil.drain
> >   20  0.33% 96.37%     665 300814 sun.nio.ch.IOUtil.drain
> >   21  0.32% 96.69%     644 300812
> > org.apache.kafka.common.metrics.Sensor.record
> >   22  0.31% 97.00%     626 300725
> > org.apache.kafka.clients.producer.internals.Partitioner.partition
> >   23  0.28% 97.28%     571 300729
> > org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> >   24  0.26% 97.54%     535 300764
> > org.apache.log4j.Category.getEffectiveLevel
> >   25  0.25% 97.79%     501 300924
> > org.apache.kafka.common.protocol.types.Schema.write
> >   26  0.19% 97.98%     392 300802
> > org.apache.kafka.common.metrics.Sensor.record
> >   27  0.19% 98.17%     386 300797
> > org.apache.kafka.common.metrics.Sensor.record
> >   28  0.17% 98.34%     342 300739
> > org.apache.kafka.common.record.Record.write
> >   29  0.16% 98.50%     315 300792
> > org.apache.kafka.common.record.Record.write
> >   30  0.15% 98.64%     294 300757
> > org.apache.kafka.common.record.Record.write
> >   31  0.12% 98.76%     238 300731
> > org.apache.kafka.common.record.Record.write
> >   32  0.09% 98.85%     180 300747
> > org.apache.kafka.clients.producer.KafkaProducer.send
> >   33  0.09% 98.94%     177 300750
> > org.apache.kafka.common.record.Record.write
> >   34  0.06% 98.99%     112 300851 sun.nio.ch.NativeThread.current
> >   35  0.05% 99.05%     110 300753
> > org.apache.kafka.clients.producer.KafkaProducer.send
> >   36  0.05% 99.09%      93 300723 java.lang.System.arraycopy
> >   37  0.04% 99.13%      80 300872
> > org.apache.kafka.clients.tools.ProducerPerformance.main
> >   38  0.04% 99.17%      78 300770 java.util.HashMap.getEntry
> >   39  0.04% 99.21%      78 300859
> > org.apache.kafka.clients.producer.internals.RecordAccumulator.append
> >   40  0.04% 99.25%      73 300861 sun.nio.ch.EPollArrayWrapper.epollCtl
> >   41  0.03% 99.28%      67 300718 sun.misc.Unsafe.copyMemory
> >   42  0.03% 99.31%      59 300737
> > org.apache.kafka.clients.producer.internals.Metadata.timeToNextUpdate
> >   43  0.03% 99.33%      52 300816
> > org.apache.kafka.clients.producer.internals.Metadata.fetch
> >   44  0.02% 99.36%      48 300715 sun.nio.ch.FileDispatcherImpl.read0
> >   45  0.02% 99.38%      42 300794
> > org.apache.log4j.Category.getEffectiveLevel
> >   46  0.02% 99.40%      41 300740
> > org.apache.kafka.clients.producer.internals.Metadata.timeToNextUpdate
> >   47  0.02% 99.42%      40 300795 sun.nio.ch.NativeThread.current
> >   48  0.01% 99.43%      28 300785 sun.nio.ch.EPollArrayWrapper.epollCtl
> >   49  0.01% 99.44%      25 301055 sun.nio.ch.EPollSelectorImpl.wakeup
> >   50  0.01% 99.45%      22 300806 java.lang.Thread.currentThread
> > CPU SAMPLES END
> >
> >
> > On Fri, Nov 21, 2014 at 5:05 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Great. There is a single I/O thread per producer client that does the
> > > sending so it could be either that the sender thread or that thread is
> > just
> > > pegged. One way to dive in and see what is happening is to add the
> > command
> > > line option
> > > *  -agentlib:hprof=cpu=samples,depth=10*
> > > This will tell us where the time is going. If you sent that around it
> may
> > > be informative.
> > >
> > > -Jay
> > >
> > >
> > > On Thu, Nov 20, 2014 at 12:41 AM, Manu Zhang <owenzhang1...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Jay. The producer metrics from jconsole is quite helpful.
> > > >
> > > > I've switched to the new producer and run producer benchmark with
> > > >
> > > > */usr/lib/kafka/bin/kafka-run-class.sh
> > > > org.apache.kafka.clients.tools.ProducerPerformance topic1 500000000
> > 1000
> > > -1
> > > > acks=1 bootstrap.servers=node1:9092,node2:9092,node3:9092,node4:9092
> > > > buffer.memory=2097152000 batch.size=1000000 linger.ms
> > > > <http://linger.ms>=100*
> > > >
> > > > so my message size is 1000 bytes (gave up on 100 bytes after
> fruitless
> > > > experiments) and I've deliberately batched outgoing messages with
> the "
> > > > linger.ms" conf. A single producer could send 300 MB/s on average
> and
> > 3
> > > > producers almost saturated the network bandwidth. CPU is fully
> utilized
> > > for
> > > > each producer thread. It seems that I can't go further in a single
> > > > producer. Any thoughts ?
> > > >
> > > > Also, I've noticed this kafka-fast
> > > > <https://github.com/gerritjvv/kafka-fast> project,
> > > > who claimed producer throughput could reach 191975 K messages/s for
> 1KB
> > > > message on 10GbE network. The difference is that a producer is
> created
> > > per
> > > > topic partition.
> > > >
> > > >
> > > > On Wed, Nov 19, 2014 at 12:34 PM, Jay Kreps <jay.kr...@gmail.com>
> > wrote:
> > > >
> > > > > Yeah this will involve some experimentation.
> > > > >
> > > > > The metrics are visible with jconsole or another jmx viewer.
> > > > >
> > > > > It may also be worth looking at the cpu usage per-thread (e.g.
> start
> > > top
> > > > > and press 't' I think).
> > > > >
> > > > > Another simple test for broker vs client as the bottleneck is just
> to
> > > > start
> > > > > another producer or consumer and see if that improves throughput
> (if
> > so
> > > > it
> > > > > is probably a client bottleneck).
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Tue, Nov 18, 2014 at 4:44 PM, Manu Zhang <
> owenzhang1...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Jay for the quick response.
> > > > > >
> > > > > > Yes, it's a single producer and consumer both configured with
> > > multiple
> > > > > > threads but I'm not using the new producer.
> > > > > > CPU is typically 50% utilized on client and merely used on
> broker.
> > > > Disks
> > > > > > aren't busy either as a lot of data are cached in memory.
> > > > > > Would you please give a link for the producer metrics you are
> > > referring
> > > > > to
> > > > > > ?
> > > > > >
> > > > > > Thanks,
> > > > > > Manu
> > > > > >
> > > > > > On Wed, Nov 19, 2014 at 2:39 AM, Jay Kreps <jay.kr...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hey Manu,
> > > > > > >
> > > > > > > I'm not aware of a benchmark on 10GbE. I'd love to see that
> > though.
> > > > > > Diving
> > > > > > > into the results may help us find bottlenecks hidden by the
> > slower
> > > > > > network.
> > > > > > >
> > > > > > > Can you figure out where the bottleneck is in your test? I
> assume
> > > > this
> > > > > > is a
> > > > > > > single producer and consumer instance and you are using the new
> > > > > producer
> > > > > > as
> > > > > > > in those benchmarks?
> > > > > > >
> > > > > > > This can be slightly tricky as it can be cpu or I/O on either
> the
> > > > > clients
> > > > > > > or the brokers. You basically have to look at top, iostat, and
> > the
> > > > jmx
> > > > > > > metrics for clues. The producer has good metrics that explain
> > > whether
> > > > > it
> > > > > > is
> > > > > > > spending most of its time waiting or sending data. Not sure if
> > > there
> > > > > is a
> > > > > > > similar diagnostic for the consumer.
> > > > > > >
> > > > > > > -Jay
> > > > > > >
> > > > > > > On Tue, Nov 18, 2014 at 5:10 AM, Manu Zhang <
> > > owenzhang1...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I have been trying out kafka benchmarks described in Jay's
> > > > > > > >
> > > > >
> > benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machine
> > > > > > > > >.
> > > > > > > > I'm able to get similar results on a 4-node GbE network whose
> > > > > in-bytes
> > > > > > > > could be saturated at 120MB/s. However, on a 4-node, 10GbE
> > > > network, I
> > > > > > can
> > > > > > > > not get in-bytes higher than 150MB/s. *Has anyone benchmarked
> > > kafka
> > > > > on
> > > > > > a
> > > > > > > > 10GbE network ? Any rule of thumb on 10GbE network for
> > > > configurations
> > > > > > of
> > > > > > > > broker, producer and consumer ? *
> > > > > > > >
> > > > > > > > My kafka version is 0.8.1.1 and I've created a topic with 8
> > > > > partitions
> > > > > > > with
> > > > > > > > 1 replica distributed evenly among the 4 nodes. Message size
> is
> > > 100
> > > > > > > bytes.
> > > > > > > > I use all the default kafka settings.
> > > > > > > > My cluster has 4 nodes, where each node has 32 cores, 128MB
> RAM
> > > > and 3
> > > > > > > disks
> > > > > > > > for kafka.
> > > > > > > >
> > > > > > > > I've tried increasing message size to 1000 bytes which
> improved
> > > > > > > producer's
> > > > > > > > throughput but not consumer's.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Manu
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to