[ https://issues.apache.org/jira/browse/KAFKA-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999664#comment-13999664 ]
Krzysztof Ociepa commented on KAFKA-1407: ----------------------------------------- I have the same issue using the latest 0.8.1.1 release: [2014-05-16 06:32:15,591] INFO Partition [main_topic,5] on broker 2: Shrinking ISR for partition [main_topic,5] from 3,2,1 to 2 (kafka.cluster.Partition) [2014-05-16 06:32:15,633] ERROR Conditional update of path /brokers/topics/main_topic/partitions/5/state with data {"controller_epoch":6,"leader":2,"version":1,"leader_epoch":26,"isr":[2]} and expected version 51 failed due to org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/main_topic/partitions/5/state (kafka.utils.ZkUtils$) [2014-05-16 06:32:15,643] INFO Partition [main_topic,5] on broker 2: Cached zkVersion [51] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) > Broker can not return to ISR because of BadVersionException > ----------------------------------------------------------- > > Key: KAFKA-1407 > URL: https://issues.apache.org/jira/browse/KAFKA-1407 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.1 > Reporter: Dmitry Bugaychenko > Assignee: Neha Narkhede > > Each morning we found a broker out of ISR at stuck with log full of messages: > {code} > INFO | jvm 1 | 2014/04/21 08:36:21 | [2014-04-21 09:36:21,907] ERROR > Conditional update of path /brokers/topics/topic2/partitions/1/state with > data > {"controller_epoch":46,"leader":2,"version":1,"leader_epoch":38,"isr":[2]} > and expected version 53 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /brokers/topics/topic2/partitions/1/state > (kafka.utils.ZkUtils$) > INFO | jvm 1 | 2014/04/21 08:36:21 | [2014-04-21 09:36:21,907] INFO > Partition [topic2,1] on broker 2: Cached zkVersion [53] not equal to that in > zookeeper, skip updating ISR (kafka.cluster.Partition) > {code} > It seems that it can not recover after short netwrok break down and the only > way to return it is restart it using kill -9. -- This message was sent by Atlassian JIRA (v6.2#6252)