Hi all, Since my company is considering adopting Kafka as our message bus, I have been assigned the task to perform some benchmark tests. I basically followed what Jay wrote on this article <http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines>
The benchmarks were set up using 4 nodes with one node acting as both producer and consumer while the rest function as Kafka brokers. This is the baseline (50M messages (100 bytes each) ,64MB buffer memory, and 8192 batch size) bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test1 50000000 100 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8192 which on our setup yielded the result of 50000000 records sent, 265939.057406 records/sec (25.36 MB/sec) However, by doubling the buffer.memory to 128M bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test1 500000 10000 -1 acks=1 bootstrap.servers=192.168.1.1:9092 buffer.memory=67108864 batch.size=8192 The throughput dropped significantly. 50000000 records sent, 93652.601295 records/sec (8.93 MB/sec) Anyone able to interpret why the throughput degraded so much? Likewise, when performing benchmarks using 3 partitions across 3 nodes, the maximum throughput shown is roughly 33.2MB/sec, whereas a single partition (on a single node) yields 100MB/sec. My guess is that on a 3 nodes setup, I need to multiply the 33.2 MB/sec reading by 3 since the the 33.2MB/sec reading only represents the bandwidth available to one single node. Again, anyone out there willing to shed some lights on how to interpret the numbers correctly? Cheers, Paul