Can anybody help out on this? ________________________________ From: Rohit Sardesai Sent: 19 June 2016 11:47:01 To: users@kafka.apache.org Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api
In my tests , I am using around 24 consumer groups. I never call consumer.close() or consumer.unsubscribe() until the application is shutting down. So the consumers never leave but new consumer instances do get created as the parallel requests pile up . Also, I am reusing consumer instances if they are idle ( i,.e not serving any consume request). So with 9 partitions , I do 9 parallel consume requests in parallel every second under the same consumer group. So to summarize I have the following test setup : 3 Kafka brokers , 2 zookeeper nodes, 1 topic , 9 partitions , 24 consumer groups and 9 consume requests at a time. ________________________________ From: Dana Powers <dana.pow...@gmail.com> Sent: 19 June 2016 10:45 To: users@kafka.apache.org Subject: Re: consumer.poll() takes approx. 30 seconds - 0.9 new consumer api Is your test reusing a group name? And if so, are your consumer instances gracefully leaving? This may cause subsequent 'rebalance' operations to block until those old consumers check-in or the session timeout happens (30secs) -Dana On Jun 18, 2016 8:56 PM, "Rohit Sardesai" <rohit.sarde...@outlook.com> wrote: > I am using the group management feature of Kafka 0.9 to handle partition > assignment to consumer instances. I use the subscribe() API to subscribe to > the topic I am interested in reading data from. I have an environment > where I have 3 Kafka brokers with a couple of Zookeeper nodes . I created > a topic with 9 partitions . The performance tests attempt to send 9 > parallel poll() requests to the Kafka brokers every second. The results > show that each poll() operation takes around 30 seconds for the first time > it polls and returns 0 records. Also , when I print the partition > assignment to this consumer instance , I see no partitions assigned to it. > The next poll() does return quickly ( ~ 10-20 ms) with data and some > partitions assigned to it. > > With each consumer taking 30 seconds , the performance tests report very > low throughput since I run the tests for around 1000 seconds out which I > produce messages on the topic for the complete duration and I start the > parallel consume requests after 400 seconds. So out of 400 seconds , with 9 > consumers taking 30 seconds each , around 270 seconds are spent in the > first poll without any data. Is this because of the re-balance operation > that the consumers are blocked on the poll() ? What is the best way to use > poll() if I have to serve many parallel requests per second ? Should I > prefer manual assignment of partitions in this case instead of relying on > re-balance ? > > > Regards, > > Rohit Sardesai > >