Bejamin, do you mean thread on a client side? I'm not quite getting what I'm limited with. Can you please explain little bit more?
A single threaded producer is still capable of doing 50 MB/s on hi1.4xlarge. Which is quite slower than 377 MB/s from single job of FIO. But still 5 times faster than what I'm getting from consumer. Is it as expected to be? Another mystery for me is that in case of hot IO cache (whole topic is in memory): I'm getting 50 MB/s - 100 MB/s (this huge std. dev. bugs me too) from a single threaded consumer. And when cache is cold, I'm not seeing that kafka broker making best possible from SSD it has. I've tried setting fetch-size to 100 MB, but still kafka hits disk with 10 MB/s. (the disk by itself can satisfy much more read requests with same latency and provide much higher throughput). For me it looks as if http://man7.org/linux/man-pages/man2/sendfile.2.html somehow works inefficiently with SSD. And I don't understand why and how can this be fixed. I do understand that you advising me to use more partitions and more consumer threads. But I would like to know the limits I'm hitting with this single threaded mode. Thanks! Rafael Bagmanov, Grid Dynamics 2013/8/30 Benjamin Black <b...@b3k.us>: > You are maxing out the single consumer thread. > On Aug 30, 2013 1:35 AM, "Rafael Bagmanov" <bugzma...@gmail.com> wrote: > >> Hi, >> >> I am trying to understand how fast is kafka 0.7 compared to what I can get >> from hard drive. In essence I have 3 questions. >> >> In all tests below, I'm using single broker with single one-partitioned >> topic. Kafka perf tests have been run in 2 deployment configs: >> - broker, perf-test on same host >> - broker, perf-test on different hosts (the results are practically the >> same, so wont post them here) >> >> >> I'm using FIO(http://freecode.com/projects/fio) to benchmark speed of hard >> drives. >> >> Hardware I'm using: >> 1) m1.xlarge with ephemeral storage, 4 core cpu, 16 GB ram >> 2) hi1.4xlarge with SSD, 16 core cpu, 64 GB ram >> 3) desktop machine with 7200 rpm sata, 4 core cpu, 8 GB ram >> >> Kafka broker config: >> Oracle jdk 1.6.0_38, -Xmx2048 >> >> socket.send.buffer=16777216 >> socket.receive.buffer=16777216 >> max.socket.request.bytes=104857600 >> log.flush.interval=10000 >> log.default.flush.interval.ms=1000 >> log.default.flush.scheduler.interval.ms=1000 >> num.threads=[num of cores] >> >> >> For kafka-producer-perf-test I'm assuming that IO access pattern is >> sequential write. >> >> Here is the test I ran with FIO: >> >> [sequential-write] >> rw=write >> size=50G >> ioengine=sync >> numjobs=1 >> directory=/tmp/fio >> filename=redo01.log >> >> >> Here is kafka performance test: >> >> ./bin/kafka-producer-perf-test.sh -topic "perf" --batch-size 3000 >> --messages 50000000 --message-size 1300 --brokerinfo >> broker.list=0:host:9092 --threads [number-of-cores] >> >> >> ---------------------------------------------------------------------------------------- >> | | m1.xlarge | hi1.4xlarge | desktop >> | >> >> >> ---------------------------------------------------------------------------------------- >> | kafka | 41 MB/s | 217 MB/s | 42 MB/s | >> >> >> ----------------------------------------------------------------------------------------- >> | fio | 106 MB/s | 377 MB/s | 74 MB/s | >> >> ---------------------------------------------------------------------------------------- >> >> >> Question 1: The proportion (~1/2) is pretty stable against different kind >> of hardware I've tried. Is it as expected? Can something be done to improve >> this? >> >> I've tried to play with: >> log.flush.interval=10000 >> log.default.flush.interval.ms=1000 >> log.default.flush.scheduler.interval.ms=1000 >> >> Like increasing 10 times, or decreasing 10 times, but haven't seen much of >> a difference in IO throughput >> >> The other thing that bugs me much more is that kafka consumer speed on cold >> IO cache is like 5-50 times slower from what I can get with "sequential >> read" fio test. >> >> For kafka-consumer-perf-test I'm assuming that IO access pattern is >> sequential read. >> >> Here is FIO test: >> >> [sequential-read] >> rw=read >> size=50G >> ioengine=sync # I know that kafka use sendfile, but sync should be >> slower, right? >> numjobs=1 >> directory=/tmp/fio >> filename=redo01.log >> >> Here what I'm doing with kafka-consumer-perf-test: >> >> kafka-consumer-perf-test.sh -topic "perf" --messages 50000000 --zookeeper >> host:2181 --threads 1 --socket-buffer-size 16777216 --fetch-size 16777216 >> >> The broker config is the same. >> >> I'm dropping IO cache before running tests: echo 3 > >> /proc/sys/vm/drop_caches >> >> >> ----------------------------------------------------------------------------------------------- >> | | m1.xlarge | hi1.4xlarge | >> desktop | >> >> >> --------------------------------------------------------------------------------------------- >> | kafka | 25 MB/s | 10 MB/s (???) | 20 MB/s >> | >> >> >> --------------------------------------------------------------------------------------------- >> | fio | 130 MB/s | 450 MB/s | 67 >> MB/s | >> >> ---------------------------------------------------------------------------------------------- >> >> Question 2: Can something be done to improve consumer performance? >> >> Question 3 (most improtant for me): What might be the reasons for consumer >> to behave so badly on fastest hardware available? I see in iostat, that >> consumer really does very little read requests to hard drive >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz >> avgqu-sz await r_await w_await svctm %util >> xvdb 0.00 0.00 144.00 0.00 6144.00 0.00 85.33 >> 0.06 0.42 0.42 0.00 0.08 1.20 >> >> And cpus are idling >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 2.16 0.00 0.09 0.06 0.03 97.66 >> >> >> Besides that, even if the whole topic is in IO cache, the consumer speed is >> about 45 MB/s which is still quite below my expectations. >> >> And the picture doesn't change in different deployment configs (broker and >> test on same node or 2 different nodes) >> >> Any ideas why this might happen? >> >> Rafael Bagmanov, >> Grid Dynamics. >>