I am using the group management feature of Kafka 0.9 to handle partition assignment to consumer instances. I use the subscribe() API to subscribe to the topic I am interested in reading data from. I have an environment where I have 3 Kafka brokers with a couple of Zookeeper nodes . I created a topic with 9 partitions . The performance tests attempt to send 9 parallel poll() requests to the Kafka brokers every second. The results show that each poll() operation takes around 30 seconds for the first time it polls and returns 0 records. Also , when I print the partition assignment to this consumer instance , I see no partitions assigned to it. The next poll() does return quickly ( ~ 10-20 ms) with data and some partitions assigned to it.
With each consumer taking 30 seconds , the performance tests report very low throughput since I run the tests for around 1000 seconds out which I produce messages on the topic for the complete duration and I start the parallel consume requests after 400 seconds. So out of 400 seconds , with 9 consumers taking 30 seconds each , around 270 seconds are spent in the first poll without any data. Is this because of the re-balance operation that the consumers are blocked on the poll() ? What is the best way to use poll() if I have to serve many parallel requests per second ? Should I prefer manual assignment of partitions in this case instead of relying on re-balance ? Regards, Rohit Sardesai