I am using the group management feature of Kafka 0.9 to handle partition 
assignment to consumer instances. I use the subscribe() API to subscribe to the 
topic I am interested in reading data from.  I have an environment where I have 
3 Kafka brokers  with a couple of Zookeeper nodes . I created a topic with 9 
partitions . The performance tests attempt to send 9 parallel poll() requests 
to the Kafka brokers every second. The results show that each poll() operation 
takes around 30 seconds for the first time it polls and returns 0 records. Also 
, when I print the partition assignment to this consumer instance , I see no 
partitions assigned to it.  The next poll() does return quickly ( ~ 10-20 ms) 
with data and some partitions assigned to it.

With each consumer taking 30 seconds , the performance tests report very low 
throughput since I run the tests for around 1000 seconds out which I produce 
messages on the topic for the complete duration and I start the parallel 
consume requests after 400 seconds. So out of 400 seconds , with 9 consumers 
taking 30 seconds each , around 270 seconds are spent in the first poll without 
any data. Is this because of the re-balance operation that the consumers are 
blocked on the poll() ? What is the best way to use poll()  if I have to serve 
many parallel requests per second ?  Should I prefer manual assignment of 
partitions in this case instead of relying on re-balance ?


Regards,

Rohit Sardesai

Reply via email to