[ 
https://issues.apache.org/jira/browse/KAFKA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879059#comment-15879059
 ] 

Shannon Carey commented on KAFKA-3798:
--------------------------------------

Recently had this problem with an 0.8.2.2 consumer as well. I didn't see any 
Zookeeper connection timeout though. It looks like this was precipitated by 
several Kafka servers restarting at the same time. I don't know why the topic 
was deleted from ZK at one point. Here's an overview of the log messages:

[...-1486594733560-c90a4419], begin rebalancing consumer 
...-1486594733560-c90a4419 try #0"
[...-1486594733560-c90a4419], exception during rebalance "
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /consumers/.../ids/...-1486594733560-c90a4419
end rebalancing consumer ...-1486594733560-c90a4419 try #0
Rebalancing attempt failed. Clearing the cache before the next rebalancing 
operation is triggered
... various stopping/shutting down of threads ...
Stopping all fetchers
Topic for path /brokers/topics/mytopic gets deleted, which should not happen at 
this time
All connections stopped
Cleared all relevant queues for this fetcher
Cleared the data chunks in all the consumer message iterators
Committing all offsets after clearing the fetcher queues
begin rebalancing consumer ...-1486594733560-c90a4419 try #1
...
can't rebalance after 15 retries

Repeats that several times, for ~6hrs before finally gives up for some reason.


> Kafka Consumer 0.10.0.0 killed after rebalancing exception
> ----------------------------------------------------------
>
>                 Key: KAFKA-3798
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3798
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 0.10.0.0
>         Environment: Production
>            Reporter: Sahitya Agrawal
>            Assignee: Neha Narkhede
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Hi , 
> I have a topic with 100 partitions and 25 consumers. Consumers were working 
> fine up to some time. 
> After some time I see kafka rebalancing exception in the logs. CPU usage is 
> also 100 % at that time. Consumer process got killed after that. 
> Kafka version : 0.10.0.0
> Some Error print from the logs are following:
> kafka.common.ConsumerRebalanceFailedException: prod_ip-**** can't rebalance 
> after 10 retries
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:670)
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$2.run(ZookeeperConsumerConnector.scala:589)
> exception during rebalance
> org.I0Itec.zkclient.exception.ZkNoNodeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /consumers/prod/ids/prod_ip-*******
>         at 
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>         at 
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1099)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1094)
>         at kafka.utils.ZkUtils.readData(ZkUtils.scala:542)
>         at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61)
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:674)
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:646)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:637)
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:637)
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:637)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:636)
>         at 
> kafka.consumer.ZookeeperConsumerConnector$ZKSessionExpireListener.handleNewSession(ZookeeperConsumerConnector.scala:522)
>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /consumers/prod/ids/prod_ip-******
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
>         at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:124)
>         at org.I0Itec.zkclient.ZkClient$12.call(ZkClient.java:1103)
>         at org.I0Itec.zkclient.ZkClient$12.call(ZkClient.java:1099)
>         at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to