Hi, We are using kafka-0.7.2 with zookeeper (3.4.5)
Our cluster configuration: 3 brokers on 3 different machines. Each broker machine has a zookeeper instance running as well. We have 15 topics defined. We are trying to use them as queue (JMS like) by defining the same group across different kafka consumers. On the consumer side, we are using High Level Consumer. However we are seeing a weird behaviour. One of our heavily used queue (event_queue) has 2 dedicated consumers listening to that queue only. This queue is defined with 150 partitions on each broker & the number of streams defined on the 2 dedicated consumers is 150. After a while we see that most the consumer threads keep waiting for events and the lag keeps growing. If we kill one of the dedicated consumers, then the other consumer starts getting messaging in a hurry. Consumer had no Full GCs. How we measure lag? bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group event_queue --zkconnect zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/kafka --topic event_queue Around the time, the events stopped coming to the new consumer.. this was printed on the logs: [INFO] zookeeper state changed (Disconnected) [INFO] zookeeper state changed (Disconnected) [INFO] zookeeper state changed (SyncConnected) [INFO] zookeeper state changed (SyncConnected) Config Overidden: Consumer: fetch.size=3MB autooffset.reset=largest autocommit.interval.ms=500 Producer: maxMessageSize=3MB Please let us know if we are doing some wrong OR facing some known issue here? Thanks, Nihit