Hi Guozhang, Thank you so much for your reply. Follow up questions:
1. We are using default onPartitionAssigned callback, not customized one. What do you think would be the possible reason of the heavy callback? Can you point me the actual implementation class for Java 0.9.0.1 driver? 2. Does consumer group information store in zookeeper? If it is, do you think it is possible that the session timeout is because of the zookeeper? Can you tell me the actual znode that stores the consumer group information? Thanks, Jun On Sat, Nov 19, 2016 at 9:32 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hello Jun, > > The reason that the group is bouncing between generation 0 and 1 is that it > is iteratively removed as an empty group after the only member is removed, > and then re-create the group with the member jumping back to re-join. > > My suspect is that your PartitionAssigned callback is too heavy, that it > takes more than the session timeout. Note that the coordinator start > ticking on the member right after the partition is assigned, but the > consumer will not start heartbeating until the callback is completed. > > As for the "Correlation id for response (767587) does not match request > (767585)" issue, it is indeed a bit weird and I have not seen it before. > Would need to investigate further from the logs. > > Currently there is no tools for searching for all the active consumer > groups for a given topic, one has to query all the consumer groups and > filter by their subscribed topics. > > > Guozhang > > > > > On Fri, Nov 18, 2016 at 4:19 PM, Jun MA <mj.saber1...@gmail.com> wrote: > > > Hi guys, > > > > I’m using kafka 0.9.0.1 and Java client. I saw the following exceptions > > throw by my consumer: > > Caused by: java.lang.IllegalStateException: Correlation id for response > > (767587) does not match request (767585) > > at org.apache.kafka.clients.NetworkClient.correlate( > > NetworkClient.java:477) > > at org.apache.kafka.clients.NetworkClient. > handleCompletedReceives( > > NetworkClient.java:440) > > at org.apache.kafka.clients.NetworkClient.poll( > > NetworkClient.java:265) > > at org.apache.kafka.clients.consumer.internals. > > ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > > at org.apache.kafka.clients.consumer.internals. > > ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > > at org.apache.kafka.clients.consumer.internals. > > ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > > at org.apache.kafka.clients.consumer.KafkaConsumer. > > pollOnce(KafkaConsumer.java:908) > > at org.apache.kafka.clients.consumer.KafkaConsumer.poll( > > KafkaConsumer.java:853) > > at com.hulu.flintan.metadatacache.kafka. > > AssetChangeEventConsumer.run(AssetChangeEventConsumer.java:47) > > 35 seconds later, I started seeing > > > > Error ILLEGAL_GENERATION occurred while committing offsets for group > > flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856 > > > > happens all the time. > > > > I then checked the server side log, it shows > > > > [2016-11-17 06:17:28,868] INFO [GroupCoordinator 2]: Preparing to > > restabilize group flintan-metadatacache-5b9d551b-bd4a-41fa-812f- > 3989c76c3856 > > with old generation 1 (kafka.coordinator.GroupCoordinator) > > [2016-11-17 06:17:28,869] INFO [GroupCoordinator 2]: Group > > flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856 generation 1 > > is dead and removed (kafka.coordinator.GroupCoordinator) > > [2016-11-17 06:17:42,396] INFO [GroupCoordinator 2]: Preparing to > > restabilize group flintan-metadatacache-5b9d551b-bd4a-41fa-812f- > 3989c76c3856 > > with old generation 0 (kafka.coordinator.GroupCoordinator) > > [2016-11-17 06:17:42,396] INFO [GroupCoordinator 2]: Stabilized group > > flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856 generation 1 > > (kafka.coordinator.GroupCoordinator) > > [2016-11-17 06:17:42,399] INFO [GroupCoordinator 2]: Assignment received > > from leader for group flintan-metadatacache-5b9d551b-bd4a-41fa-812f- > 3989c76c3856 > > for generation 1 (kafka.coordinator.GroupCoordinator) > > [2016-11-17 06:18:12,404] INFO [GroupCoordinator 2]: Preparing to > > restabilize group flintan-metadatacache-5b9d551b-bd4a-41fa-812f- > 3989c76c3856 > > with old generation 1 (kafka.coordinator.GroupCoordinator) > > > > over and over all the time. It looks like the consumer group is bouncing > > between generation 0 and 1, and it stop consuming anything. This consumer > > group only have 1 consumer with it. > > > > We are using > > auto commit with interval 1000ms > > session timeout 30000ms > > heartbeat interval 3000ms > > > > My questions are: > > 1. Why this happens and how to prevent it happening again? > > 2. If it happens, how should I react in this case? Catch > IllegalGeneration > > exception and resubscribe the topic? Or recreate the consumer w/ same (or > > different) consumer group id? > > 3. Where can I find the active consumer group for a topic? Does that > store > > in zookeeper? > > > > Thanks, > > Jun > > > > > -- > -- Guozhang >