Hello, No, in all test I used two partitions per topic, one per broker. The only variable in the test was number of topics. From what I've read what really affects performance is total number of partitions per broker (probably including replicas), so 1 topic with 1000 partitions should pretty much offer similar performance characteristics as 1000 topics with single partition each (provided of course we utilize those partitions in the same manner). Do your findings contradict this?
As for metadata exchange - you mean communication with Zookeeper or nodes in the cluster directly exchange any per-partition metadata? Chris On Tue, 2016-07-19 at 23:47 -0700, R Krishna wrote: > We did similar testing recently, newbie here, assuming you did async > publisher, did you also test with multiple partitions (1-1000) per topic as > well. More topics, implies more metadata per topic exchanged every minute, > more batches maintained and flushed per topic+partition per producer so > higher CPU/memory usage. What we noticed was for the same topic, more > producers gave more throughput, and increasing partitions reduced > throughput probably for the similar reasoning as above. You may want to try > again by increasing these metadata exchange timeouts and memory as you > increase topics/partitions. > > > > > > > > On Tue, Jul 19, 2016 at 2:33 PM, Krzysztof Nawara <krzysztof.naw...@cern.ch> > wrote: > > > Hi, > > > > I have been testing Kafka in order to determine how does number of > > partitions affects performance. In order to do that, I have set up 2 Kafka > > nodes and 1 Zookeeper nodes, and used 8 producers running on different > > machines to send messages. > > First, my producers were sending messages to broker selecting topics in > > round-robin fashion - each message would go to different topic. In second > > scenario I created just as many topics, but all producers concentrated on > > sending to just one of them. In first case, with 1000-topic/cluster > > throughput was ~250 times smaller than with 1-topic/cluster scenario. In > > the second one it was more like 1.5-2 times slower - huge difference. > > Do you have any ideas what might be the cause? Two things I came up with > > were: under-utilization of producer batching and a lot of random IO on the > > brokers. Both are just wild guesses. I have been using stock configuration > > - can you think about any particular properties I should play with? > > > > Details: > > Machines (Kafka, Zk): STRATOS S810-X52L (32 cores, 64GB RAM), data stored > > on single dedicated SATA drive > > Machines (producers): Openstacks VMs (4 vCPUs, 8GB, 40GBs on SSD) > > Partitions per topic: 2 > > Replication factor: 2 > > > > Cordially, > > Chris > > > >