Hi Tom, Thank you for your help. I have only one broker. I used kafka production server configuration listed in kafka's documentation page: http://kafka.apache.org/documentation.html#prodconfig . I have increased the flush interval and number of messages to prevent the disk from becoming the bottleneck. For the consumers, I used the following configurations: Properties props = new Properties(); props.put("enable.auto.commit", "true"); props.put("request.timeout.ms", "500000000"); props.put("session.timeout.ms", "50000000"); props.put("connections.max.idle.ms", "50000000"); props.put("fetch.min.bytes", 1); props.put("fetch.max.wait.ms", "500"); props.put("group.id", "gid"); props.put("key.deserializer", StringDeserializer.class.getName()); props.put("value.deserializer", StringDeserializer.class.getName()); props.put("max.partition.fetch.bytes", "128"); consumer = new KafkaConsumer<String, String>(props);
I am setting the max.partition.fetch.bytes to 128, because I only want to process one record for each poll. Thank a lot for your help. I really appreciate it. On Tue, May 24, 2016 at 7:51 AM, Tom Crayford <tcrayf...@heroku.com> wrote: > What's your server setup for the brokers and consumers? Generally I'd > expect something to be exhausted here and that to end up being the > bottleneck. > > Thanks > > Tom Crayford > Heroku Kafka > > On Mon, May 23, 2016 at 7:32 PM, Yazeed Alabdulkarim < > y.alabdulka...@gmail.com> wrote: > > > Hi, > > I am running simple experiments to evaluate the scalability of Kafka > > consumers with respect to the number of partitions. I assign every > consumer > > to a specific partition. Each consumer polls the records in its assigned > > partition and print the first one, then polls again from the offset of > the > > printed record until all records are printed. Prior to running the test, > I > > produce 10 Million records evenly among partitions. After running the > test, > > I measure the time it took for the consumers to print all the records. I > > was expecting Kafka to scale as I increase the number of > > consumers/partitions. However, the scalability diminishes as I increase > the > > number of partitions/consumers, beyond certain number. Going from 1,2,4,8 > > the scalability is great as the duration of the test is reduced by the > > factor increase of the number of partitions/consumers. However, beyond 8 > > consumers/partitions, the duration of the test reaches a steady state. I > am > > monitoring the resources of my server and didn't see any bottleneck. Am I > > missing something here? Shouldn't Kafka consumers scale with the number > of > > partitions? > > -- > > Best Regards, > > Yazeed Alabdulkarim > > > -- Best Regards, Yazeed Alabdulkarim