Bert, Thanks for sharing. Which version of Kafka were you testing?
Jun On Fri, Apr 25, 2014 at 3:11 PM, Bert Corderman <bertc...@gmail.com> wrote: > I have been testing kafka for the past week or so and figured I would share > my results so far. > > > I am not sure if the formatting will keep in email but here are the results > in a google doc...all 1,100 of them > > > > https://docs.google.com/spreadsheets/d/1UL-o2MiV0gHZtL4jFWNyqRTQl41LFdM0upjRIwCWNgQ/edit?usp=sharing > > > > One thing I found is there appears to be a bottleneck in > kafka-producer-perf-test.sh > > > The servers I used for testing have 12 7.2K drives and 16 cores. I was NOT > unable to scale the broker past 350MBsec when adding drives even though I > was able to get 150MBsec from a single drive. I wanted to determine the > source of the low utilization. > > > I tired changing the following > > · log.flush.interval.messages on the broker > > · log.flush.interval.ms flush on the broker > > · num.io.threads on the broker > > · thread settings on the producer > > · producer message sizes > > · producer batch sizes > > · different number of topics (which impact the number of drives) > > None of the above had any impact. The last thing I tried was running > multiple producers which had a very noticeable impact. As previously > mentioned I had already tested the thread setting of the producer and found > it to scale when increasing the thread count from 1,2,4 and 8. After that > it plateaued so I had been using 8 threads for each test. To show the > impact on number of producers I created 12 topics with partition counts > from 1 to 12. I used a single broker with no replication and configured > the producer(s) to send 10 million 2200 byte messages in batches of 400 > with no ack. > > > Running with three producers has almost double the throughput that one > producer will have. > > > Other Key points learned so far > > · Ensure you are using correct network interface. ( use > advertised.host.name if the servers have multiple interfaces) > > · Use batching on the producer – With a single broker sending 2200 > byte messages in batches of 200 resulted in 283MBsec vs. a batch size of 1 > was 44MBsec > > · The message size, the configuration of request.required.acks and > the number of replicas (only when ack is set to all) had the most influence > on the overall throughput. > > · The following table shows results of testing with messages sizes > of 200, 300, 1000 and 2200 bytes on a three node cluster. Each message > size was tested with the three available ack modes (NONE, LEADER and ALL) > and with replication of two and three copies. Having three copies of data > is recommended, however both are included for reference. > > *Replica=2* > > *Replica=3* > > *message.size* > > *acks* > > *MB.sec* > > *nMsg.sec* > > *MB.sec* > > *nMsg.sec* > > *Per Server MB.sec* > > *Per Server nMsg.sec* > > 200 > > NONE > > 251 > > 1,313,888 > > 237 > > 1,242,390 > > 79 > > 414,130 > > 300 > > NONE > > 345 > > 1,204,384 > > 320 > > 1,120,197 > > 107 > > 373,399 > > 1000 > > NONE > > 522 > > 546,896 > > 515 > > 540,541 > > 172 > > 180,180 > > 2200 > > NONE > > 368 > > 175,165 > > 367 > > 174,709 > > 122 > > 58,236 > > 200 > > LEADER > > 115 > > 604,376 > > 141 > > 739,754 > > 47 > > 246,585 > > 300 > > LEADER > > 186 > > 650,280 > > 192 > > 670,062 > > 64 > > 223,354 > > 1000 > > LEADER > > 340 > > 356,659 > > 328 > > 343,808 > > 109 > > 114,603 > > 2200 > > LEADER > > 310 > > 147,846 > > 293 > > 139,729 > > 98 > > 46,576 > > 200 > > ALL > > 74 > > 385,594 > > 58 > > 304,386 > > 19 > > 101,462 > > 300 > > ALL > > 105 > > 367,282 > > 78 > > 272,316 > > 26 > > 90,772 > > 1000 > > ALL > > 203 > > 212,400 > > 124 > > 130,305 > > 41 > > 43,435 > > 2200 > > ALL > > 212 > > 100,820 > > 136 > > 64,835 > > 45 > > 21,612 > > > > Some observations from the above table > > · Increasing the number of replicas when request.required.acks is > none or leader only has limited impact on overall performance (additional > resources are required to replicate data but during tests this did not > impact producer throughput) > > · Compression is not shown as it was found that the data generated > for the test is not realistic to a production workload. (GZIP compressed > data 300:1 which is unrealistic ) > > · For some reason a message size of 1000 bytes performed the best. > Need to look into this more. > > > Thanks > > Bert >