[ https://issues.apache.org/jira/browse/KAFKA-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106758#comment-15106758 ]
Cosmin Marginean edited comment on KAFKA-2985 at 1/19/16 2:05 PM: ------------------------------------------------------------------ I can confirm that our processing takes tens of seconds so then it's probably why we have run into this. Thanks Michal for further diagnosis. I read some of the content in these linked discussions. Large number of messages is one thing that seems it will be addressed with max.poll.records. However long processing (of a single message) still seems like something that will need to go out in a separate thread, which adds complexity to the design. was (Author: cosmin.marginean): I can confirm that our processing takes tens of seconds so then it's probably why we have run into this. Thanks Michal for further diagnosis. I read some of the content in this linked discussions. Large number of messages is one thing that seems it will be addressed with max.poll.records. However long processing (of a single messages) still seems like something that will need to go out in a separate thread, which adds complexity to the design. > Consumer group stuck in rebalancing state > ----------------------------------------- > > Key: KAFKA-2985 > URL: https://issues.apache.org/jira/browse/KAFKA-2985 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.9.0.0 > Environment: Kafka 0.9.0.0. > Kafka Java consumer 0.9.0.0 > 2 Java producers. > 3 Java consumers using the new consumer API. > 2 kafka brokers. > Reporter: Jens Rantil > Assignee: Jason Gustafson > > We've doing some load testing on Kafka. _After_ the load test when our > consumers and have two times now seen Kafka become stuck in consumer group > rebalancing. This is after all our consumers are done consuming and > essentially polling periodically without getting any records. > The brokers list the consumer group (named "default"), but I can't query the > offsets: > {noformat} > jrantil@queue-0:/srv/kafka/kafka$ ./bin/kafka-consumer-groups.sh > --new-consumer --bootstrap-server localhost:9092 --list > default > jrantil@queue-0:/srv/kafka/kafka$ ./bin/kafka-consumer-groups.sh > --new-consumer --bootstrap-server localhost:9092 --describe --group > default|sort > Consumer group `default` does not exist or is rebalancing. > {noformat} > Retrying to query the offsets for 15 minutes or so still said it was > rebalancing. After restarting our first broker, the group immediately started > rebalancing. That broker was logging this before restart: > {noformat} > [2015-12-12 13:09:48,517] INFO [Group Metadata Manager on Broker 0]: Removed > 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) > [2015-12-12 13:10:16,139] INFO [GroupCoordinator 0]: Stabilized group default > generation 16 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:10:16,141] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 16 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:10:16,575] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 16 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:11:15,141] INFO [GroupCoordinator 0]: Stabilized group default > generation 17 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:11:15,143] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 17 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:11:15,314] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 17 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:12:14,144] INFO [GroupCoordinator 0]: Stabilized group default > generation 18 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:12:14,145] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 18 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:12:14,340] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 18 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:13:13,146] INFO [GroupCoordinator 0]: Stabilized group default > generation 19 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:13:13,148] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 19 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:13:13,238] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 19 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:14:12,148] INFO [GroupCoordinator 0]: Stabilized group default > generation 20 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:14:12,149] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 20 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:14:12,360] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 20 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:15:11,150] INFO [GroupCoordinator 0]: Stabilized group default > generation 21 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:15:11,152] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 21 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:15:11,217] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 21 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:16:10,152] INFO [GroupCoordinator 0]: Stabilized group default > generation 22 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:16:10,154] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 22 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:16:10,339] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 22 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:17:09,155] INFO [GroupCoordinator 0]: Stabilized group default > generation 23 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:17:09,157] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 23 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:17:09,262] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 23 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:18:08,157] INFO [GroupCoordinator 0]: Stabilized group default > generation 24 (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:18:08,159] INFO [GroupCoordinator 0]: Assignment received from > leader for group default for generation 24 > (kafka.coordinator.GroupCoordinator) > [2015-12-12 13:18:08,333] INFO [GroupCoordinator 0]: Preparing to restabilize > group default with old generation 24 (kafka.coordinator.GroupCoordinator) > {noformat} > Our consumers were logging: > {noformat} > Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator Marking the > coordinator 2147483647 dead. > Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Error > UNKNOWN_MEMBER_ID occurred while committing offsets for group default > Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Auto offset > commit failed: Commit cannot be completed due to group rebalance > Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator Marking the > coordinator 2147483647 dead. > Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Error > UNKNOWN_MEMBER_ID occurred while committing offsets for group default > Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Auto offset > commit failed: Commit cannot be completed due to group rebalance > Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Error > UNKNOWN_MEMBER_ID occurred while committing offsets for group default > Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Auto offset > commit failed: > Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator Attempt to > join group default failed due to unknown member id, resetting and retrying. > Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Error > UNKNOWN_MEMBER_ID occurred while committing offsets for group default > Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator Auto offset > commit failed: > Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] > org.apache.kafka.clients.consumer.internals.AbstractCoordinator Attempt to > join group default failed due to unknown member id, resetting and retrying. > {noformat} > I understand that the broker might start rebalancing if my consumers hasn't > reported heartbeat in session timeout. This might well have happened during > my load test. However, the issue here is that the rebalancing doesn't > stabilize/finish after the load test is done. > Let me know if I can be of any assistance to track this down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)