[ https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360306#comment-15360306 ]
Peter Davis commented on KAFKA-3893: ------------------------------------ Sriharsha, I have witnessed this too and it very much seems like a bug in Kafka -- when a zookeeper connection is lost, any other changes in the cluster during the loss are not recognized when it reconnects. We see the same loop of "Shrinking ISR" and "Cached zkVerskom [###] not equal to that in zookeeper", and the broker never recovers. For us this happened almost daily when running on a cluster virtual machines that would get paused for a few seconds every night for a snapshot backup. We disabled the backup but it's very concerning that Kafka won't recover after a pause! > Kafka Borker ID disappears from /borkers/ids > -------------------------------------------- > > Key: KAFKA-3893 > URL: https://issues.apache.org/jira/browse/KAFKA-3893 > Project: Kafka > Issue Type: Bug > Reporter: chaitra > Priority: Critical > > Kafka version used : 0.8.2.1 > Zookeeper version: 3.4.6 > We have scenario where kafka 's broker in zookeeper path /brokers/ids just > disappears. > We see the zookeeper connection active and no network issue. > The zookeeper conection timeout is set to 6000ms in server.properties > Hence Kafka not participating in cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)