[
https://issues.apache.org/jira/browse/KAFKA-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057055#comment-15057055
]
Ismael Juma commented on KAFKA-2985:
------------------------------------
Thanks for the report. Would you be able to test the 0.9.0 branch? It contains
a number of important fixes since the 0.9.0.0 release.
> Consumer group stuck in rebalancing state
> -----------------------------------------
>
> Key: KAFKA-2985
> URL: https://issues.apache.org/jira/browse/KAFKA-2985
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 0.9.0.0
> Environment: Kafka 0.9.0.0.
> Kafka Java consumer 0.9.0.0
> 2 Java producers.
> 3 Java consumers using the new consumer API.
> 2 kafka brokers.
> Reporter: Jens Rantil
> Assignee: Neha Narkhede
>
> We've doing some load testing on Kafka. _After_ the load test when our
> consumers and have two times now seen Kafka become stuck in consumer group
> rebalancing. This is after all our consumers are done consuming and
> essentially polling periodically without getting any records.
> The brokers list the consumer group (named "default"), but I can't query the
> offsets:
> {noformat}
> jrantil@queue-0:/srv/kafka/kafka$ ./bin/kafka-consumer-groups.sh
> --new-consumer --bootstrap-server localhost:9092 --list
> default
> jrantil@queue-0:/srv/kafka/kafka$ ./bin/kafka-consumer-groups.sh
> --new-consumer --bootstrap-server localhost:9092 --describe --group
> default|sort
> Consumer group `default` does not exist or is rebalancing.
> {noformat}
> Retrying to query the offsets for 15 minutes or so still said it was
> rebalancing. After restarting our first broker, the group immediately started
> rebalancing. That broker was logging this before restart:
> {noformat}
> [2015-12-12 13:09:48,517] INFO [Group Metadata Manager on Broker 0]: Removed
> 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
> [2015-12-12 13:10:16,139] INFO [GroupCoordinator 0]: Stabilized group default
> generation 16 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:10:16,141] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 16
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:10:16,575] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 16 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,141] INFO [GroupCoordinator 0]: Stabilized group default
> generation 17 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,143] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 17
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,314] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 17 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,144] INFO [GroupCoordinator 0]: Stabilized group default
> generation 18 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,145] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 18
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,340] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 18 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,146] INFO [GroupCoordinator 0]: Stabilized group default
> generation 19 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,148] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 19
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,238] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 19 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,148] INFO [GroupCoordinator 0]: Stabilized group default
> generation 20 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,149] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 20
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,360] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 20 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,150] INFO [GroupCoordinator 0]: Stabilized group default
> generation 21 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,152] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 21
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,217] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 21 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:16:10,152] INFO [GroupCoordinator 0]: Stabilized group default
> generation 22 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:16:10,154] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 22
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:16:10,339] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 22 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:17:09,155] INFO [GroupCoordinator 0]: Stabilized group default
> generation 23 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:17:09,157] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 23
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:17:09,262] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 23 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:18:08,157] INFO [GroupCoordinator 0]: Stabilized group default
> generation 24 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:18:08,159] INFO [GroupCoordinator 0]: Assignment received from
> leader for group default for generation 24
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:18:08,333] INFO [GroupCoordinator 0]: Preparing to restabilize
> group default with old generation 24 (kafka.coordinator.GroupCoordinator)
> {noformat}
> Our consumers were logging:
> {noformat}
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator Marking the
> coordinator 2147483647 dead.
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Error
> UNKNOWN_MEMBER_ID occurred while committing offsets for group default
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Auto offset
> commit failed: Commit cannot be completed due to group rebalance
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator Marking the
> coordinator 2147483647 dead.
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Error
> UNKNOWN_MEMBER_ID occurred while committing offsets for group default
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Auto offset
> commit failed: Commit cannot be completed due to group rebalance
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Error
> UNKNOWN_MEMBER_ID occurred while committing offsets for group default
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Auto offset
> commit failed:
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator Attempt to
> join group default failed due to unknown member id, resetting and retrying.
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Error
> UNKNOWN_MEMBER_ID occurred while committing offsets for group default
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Auto offset
> commit failed:
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer]
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator Attempt to
> join group default failed due to unknown member id, resetting and retrying.
> {noformat}
> I understand that the broker might start rebalancing if my consumers hasn't
> reported heartbeat in session timeout. This might well have happened during
> my load test. However, the issue here is that the rebalancing doesn't
> stabilize/finish after the load test is done.
> Let me know if I can be of any assistance to track this down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)