You are maxing out the single consumer thread. On Aug 30, 2013 1:35 AM, "Rafael Bagmanov" <bugzma...@gmail.com> wrote:
> Hi, > > I am trying to understand how fast is kafka 0.7 compared to what I can get > from hard drive. In essence I have 3 questions. > > In all tests below, I'm using single broker with single one-partitioned > topic. Kafka perf tests have been run in 2 deployment configs: > - broker, perf-test on same host > - broker, perf-test on different hosts (the results are practically the > same, so wont post them here) > > > I'm using FIO(http://freecode.com/projects/fio) to benchmark speed of hard > drives. > > Hardware I'm using: > 1) m1.xlarge with ephemeral storage, 4 core cpu, 16 GB ram > 2) hi1.4xlarge with SSD, 16 core cpu, 64 GB ram > 3) desktop machine with 7200 rpm sata, 4 core cpu, 8 GB ram > > Kafka broker config: > Oracle jdk 1.6.0_38, -Xmx2048 > > socket.send.buffer=16777216 > socket.receive.buffer=16777216 > max.socket.request.bytes=104857600 > log.flush.interval=10000 > log.default.flush.interval.ms=1000 > log.default.flush.scheduler.interval.ms=1000 > num.threads=[num of cores] > > > For kafka-producer-perf-test I'm assuming that IO access pattern is > sequential write. > > Here is the test I ran with FIO: > > [sequential-write] > rw=write > size=50G > ioengine=sync > numjobs=1 > directory=/tmp/fio > filename=redo01.log > > > Here is kafka performance test: > > ./bin/kafka-producer-perf-test.sh -topic "perf" --batch-size 3000 > --messages 50000000 --message-size 1300 --brokerinfo > broker.list=0:host:9092 --threads [number-of-cores] > > > ---------------------------------------------------------------------------------------- > | | m1.xlarge | hi1.4xlarge | desktop > | > > > ---------------------------------------------------------------------------------------- > | kafka | 41 MB/s | 217 MB/s | 42 MB/s | > > > ----------------------------------------------------------------------------------------- > | fio | 106 MB/s | 377 MB/s | 74 MB/s | > > ---------------------------------------------------------------------------------------- > > > Question 1: The proportion (~1/2) is pretty stable against different kind > of hardware I've tried. Is it as expected? Can something be done to improve > this? > > I've tried to play with: > log.flush.interval=10000 > log.default.flush.interval.ms=1000 > log.default.flush.scheduler.interval.ms=1000 > > Like increasing 10 times, or decreasing 10 times, but haven't seen much of > a difference in IO throughput > > The other thing that bugs me much more is that kafka consumer speed on cold > IO cache is like 5-50 times slower from what I can get with "sequential > read" fio test. > > For kafka-consumer-perf-test I'm assuming that IO access pattern is > sequential read. > > Here is FIO test: > > [sequential-read] > rw=read > size=50G > ioengine=sync # I know that kafka use sendfile, but sync should be > slower, right? > numjobs=1 > directory=/tmp/fio > filename=redo01.log > > Here what I'm doing with kafka-consumer-perf-test: > > kafka-consumer-perf-test.sh -topic "perf" --messages 50000000 --zookeeper > host:2181 --threads 1 --socket-buffer-size 16777216 --fetch-size 16777216 > > The broker config is the same. > > I'm dropping IO cache before running tests: echo 3 > > /proc/sys/vm/drop_caches > > > ----------------------------------------------------------------------------------------------- > | | m1.xlarge | hi1.4xlarge | > desktop | > > > --------------------------------------------------------------------------------------------- > | kafka | 25 MB/s | 10 MB/s (???) | 20 MB/s > | > > > --------------------------------------------------------------------------------------------- > | fio | 130 MB/s | 450 MB/s | 67 > MB/s | > > ---------------------------------------------------------------------------------------------- > > Question 2: Can something be done to improve consumer performance? > > Question 3 (most improtant for me): What might be the reasons for consumer > to behave so badly on fastest hardware available? I see in iostat, that > consumer really does very little read requests to hard drive > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > xvdb 0.00 0.00 144.00 0.00 6144.00 0.00 85.33 > 0.06 0.42 0.42 0.00 0.08 1.20 > > And cpus are idling > > avg-cpu: %user %nice %system %iowait %steal %idle > 2.16 0.00 0.09 0.06 0.03 97.66 > > > Besides that, even if the whole topic is in IO cache, the consumer speed is > about 45 MB/s which is still quite below my expectations. > > And the picture doesn't change in different deployment configs (broker and > test on same node or 2 different nodes) > > Any ideas why this might happen? > > Rafael Bagmanov, > Grid Dynamics. >