[ https://issues.apache.org/jira/browse/KAFKA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703193#comment-15703193 ]
Evan Nelson edited comment on KAFKA-3798 at 11/28/16 9:34 PM: -------------------------------------------------------------- We are experiencing the same issue with 0.8.2.2: org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/**\*/ids/\*** at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) ~[zkclient-0.3.jar:0.3] at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) ~[zkclient-0.3.jar:0.3] etc... (identifiers replaced with ***) This happens on two different topics, one with 20 partitions and one with 40. We have 22 consumers for each. The event always seems to be precipitated by a zookeeper connection timeout, which may have been triggered by a long GC pause (~5.5 seconds). Once the rebalance loop starts it _never_ recovers, no matter how many retries we allot. was (Author: ean5533): We are experiencing the same issue with 0.8.2.2: org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/***/ids/*** at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) ~[zkclient-0.3.jar:0.3] at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) ~[zkclient-0.3.jar:0.3] etc... (identifiers replaced with ***) This happens on two different topics, one with 20 partitions and one with 40. We have 22 consumers for each. The event always seems to be precipitated by a zookeeper connection timeout, which may have been triggered by a long GC pause (~5.5 seconds). Once the rebalance loop starts it _never_ recovers, no matter how many retries we allot. > Kafka Consumer 0.10.0.0 killed after rebalancing exception > ---------------------------------------------------------- > > Key: KAFKA-3798 > URL: https://issues.apache.org/jira/browse/KAFKA-3798 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Affects Versions: 0.10.0.0 > Environment: Production > Reporter: Sahitya Agrawal > Assignee: Neha Narkhede > Original Estimate: 72h > Remaining Estimate: 72h > > Hi , > I have a topic with 100 partitions and 25 consumers. Consumers were working > fine up to some time. > After some time I see kafka rebalancing exception in the logs. CPU usage is > also 100 % at that time. Consumer process got killed after that. > Kafka version : 0.10.0.0 > Some Error print from the logs are following: > kafka.common.ConsumerRebalanceFailedException: prod_ip-**** can't rebalance > after 10 retries > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:670) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$2.run(ZookeeperConsumerConnector.scala:589) > exception during rebalance > org.I0Itec.zkclient.exception.ZkNoNodeException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /consumers/prod/ids/prod_ip-******* > at > org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) > at > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1099) > at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1094) > at kafka.utils.ZkUtils.readData(ZkUtils.scala:542) > at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:674) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:646) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:637) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:637) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:637) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at > kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:636) > at > kafka.consumer.ZookeeperConsumerConnector$ZKSessionExpireListener.handleNewSession(ZookeeperConsumerConnector.scala:522) > at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /consumers/prod/ids/prod_ip-****** > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184) > at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:124) > at org.I0Itec.zkclient.ZkClient$12.call(ZkClient.java:1103) > at org.I0Itec.zkclient.ZkClient$12.call(ZkClient.java:1099) > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990) -- This message was sent by Atlassian JIRA (v6.3.4#6332)