Hello Jun,

The reason that the group is bouncing between generation 0 and 1 is that it
is iteratively removed as an empty group after the only member is removed,
and then re-create the group with the member jumping back to re-join.

My suspect is that your PartitionAssigned callback is too heavy, that it
takes more than the session timeout. Note that the coordinator start
ticking on the member right after the partition is assigned, but the
consumer will not start heartbeating until the callback is completed.

As for the "Correlation id for response (767587) does not match request
(767585)" issue, it is indeed a bit weird and I have not seen it before.
Would need to investigate further from the logs.

Currently there is no tools for searching for all the active consumer
groups for a given topic, one has to query all the consumer groups and
filter by their subscribed topics.


Guozhang




On Fri, Nov 18, 2016 at 4:19 PM, Jun MA <mj.saber1...@gmail.com> wrote:

> Hi guys,
>
> I’m using kafka 0.9.0.1 and Java client. I saw the following exceptions
> throw by my consumer:
> Caused by: java.lang.IllegalStateException: Correlation id for response
> (767587) does not match request (767585)
>         at org.apache.kafka.clients.NetworkClient.correlate(
> NetworkClient.java:477)
>         at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(
> NetworkClient.java:440)
>         at org.apache.kafka.clients.NetworkClient.poll(
> NetworkClient.java:265)
>         at org.apache.kafka.clients.consumer.internals.
> ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>         at org.apache.kafka.clients.consumer.internals.
> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>         at org.apache.kafka.clients.consumer.internals.
> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.
> pollOnce(KafkaConsumer.java:908)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> KafkaConsumer.java:853)
>         at com.hulu.flintan.metadatacache.kafka.
> AssetChangeEventConsumer.run(AssetChangeEventConsumer.java:47)
> 35 seconds later, I started seeing
>
> Error ILLEGAL_GENERATION occurred while committing offsets for group
> flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856
>
> happens all the time.
>
> I then checked the server side log, it shows
>
> [2016-11-17 06:17:28,868] INFO [GroupCoordinator 2]: Preparing to
> restabilize group flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856
> with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-11-17 06:17:28,869] INFO [GroupCoordinator 2]: Group
> flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856 generation 1
> is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-11-17 06:17:42,396] INFO [GroupCoordinator 2]: Preparing to
> restabilize group flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856
> with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-11-17 06:17:42,396] INFO [GroupCoordinator 2]: Stabilized group
> flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856 generation 1
> (kafka.coordinator.GroupCoordinator)
> [2016-11-17 06:17:42,399] INFO [GroupCoordinator 2]: Assignment received
> from leader for group 
> flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856
> for generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-11-17 06:18:12,404] INFO [GroupCoordinator 2]: Preparing to
> restabilize group flintan-metadatacache-5b9d551b-bd4a-41fa-812f-3989c76c3856
> with old generation 1 (kafka.coordinator.GroupCoordinator)
>
> over and over all the time. It looks like the consumer group is bouncing
> between generation 0 and 1, and it stop consuming anything. This consumer
> group only have 1 consumer with it.
>
> We are using
> auto commit with interval 1000ms
> session timeout 30000ms
> heartbeat interval 3000ms
>
> My questions are:
> 1. Why this happens and how to prevent it happening again?
> 2. If it happens, how should I react in this case? Catch IllegalGeneration
> exception and resubscribe the topic? Or recreate the consumer w/ same (or
> different) consumer group id?
> 3. Where can I find the active consumer group for a topic? Does that store
> in zookeeper?
>
> Thanks,
> Jun




-- 
-- Guozhang

Reply via email to