[ https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101956#comment-15101956 ]
Alexander Binzberger commented on KAFKA-3042: --------------------------------------------- I think this is the same as: KAFKA-1382 and KAFKA-2729 Maybe also: KAFKA-1407 seen this on 0.9.0.0 college told me he might have seen it on 0.8.2 for 0.9 it happened after high network load (possible network outage) (and possible slow disk IO). my test cluster is running on virtual machines on a open stack system. the virtual machines disk IO may also go over the network for read/write. At the moment I see this once or twice a day. The broker is just not recovering from that state. If this happens I can not produce to at least some partitions. This affects also the offset partitions what means I can not consume some things any more. Taking down the whole cluster and restarting it resolves the issue but is no option for a production system. The ISR state in zookeeper and topics.sh --describe show all partitions as perfectly in service. But the metadata over kafka protocol tells a different story (matching the kafka log). http://pastebin.com/6ekA5w3a > updateIsr should stop after failed several times due to zkVersion issue > ----------------------------------------------------------------------- > > Key: KAFKA-3042 > URL: https://issues.apache.org/jira/browse/KAFKA-3042 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.1 > Environment: jdk 1.7 > centos 6.4 > Reporter: Jiahongchao > > sometimes one broker may repeatly log > "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR" > I think this is because the broker consider itself as the leader in fact it's > a follower. > So after several failed tries, it need to find out who is the leader -- This message was sent by Atlassian JIRA (v6.3.4#6332)