[ https://issues.apache.org/jira/browse/KAFKA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879059#comment-15879059 ]
Shannon Carey commented on KAFKA-3798: -------------------------------------- Recently had this problem with an 0.8.2.2 consumer as well. I didn't see any Zookeeper connection timeout though. It looks like this was precipitated by several Kafka servers restarting at the same time. I don't know why the topic was deleted from ZK at one point. Here's an overview of the log messages: [...-1486594733560-c90a4419], begin rebalancing consumer ...-1486594733560-c90a4419 try #0" [...-1486594733560-c90a4419], exception during rebalance " org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/.../ids/...-1486594733560-c90a4419 end rebalancing consumer ...-1486594733560-c90a4419 try #0 Rebalancing attempt failed. Clearing the cache before the next rebalancing operation is triggered ... various stopping/shutting down of threads ... Stopping all fetchers Topic for path /brokers/topics/mytopic gets deleted, which should not happen at this time All connections stopped Cleared all relevant queues for this fetcher Cleared the data chunks in all the consumer message iterators Committing all offsets after clearing the fetcher queues begin rebalancing consumer ...-1486594733560-c90a4419 try #1 ... can't rebalance after 15 retries Repeats that several times, for ~6hrs before finally gives up for some reason. > Kafka Consumer 0.10.0.0 killed after rebalancing exception > ---------------------------------------------------------- > > Key: KAFKA-3798 > URL: https://issues.apache.org/jira/browse/KAFKA-3798 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Affects Versions: 0.10.0.0 > Environment: Production > Reporter: Sahitya Agrawal > Assignee: Neha Narkhede > Original Estimate: 72h > Remaining Estimate: 72h > > Hi , > I have a topic with 100 partitions and 25 consumers. Consumers were working > fine up to some time. > After some time I see kafka rebalancing exception in the logs. CPU usage is > also 100 % at that time. Consumer process got killed after that. > Kafka version : 0.10.0.0 > Some Error print from the logs are following: > kafka.common.ConsumerRebalanceFailedException: prod_ip-**** can't rebalance > after 10 retries > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:670) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$2.run(ZookeeperConsumerConnector.scala:589) > exception during rebalance > org.I0Itec.zkclient.exception.ZkNoNodeException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /consumers/prod/ids/prod_ip-******* > at > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > at > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1099) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1094) > at kafka.utils.ZkUtils.readData(ZkUtils.scala:542) > at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:674) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:646) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:637) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:637) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:637) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:636) > at > kafka.consumer.ZookeeperConsumerConnector$ZKSessionExpireListener.handleNewSession(ZookeeperConsumerConnector.scala:522) > at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /consumers/prod/ids/prod_ip-****** > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184) > at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:124) > at org.I0Itec.zkclient.ZkClient$12.call(ZkClient.java:1103) > at org.I0Itec.zkclient.ZkClient$12.call(ZkClient.java:1099) > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990) -- This message was sent by Atlassian JIRA (v6.3.15#6346)