[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

Alexander Binzberger (JIRA) Fri, 15 Jan 2016 07:51:58 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101956#comment-15101956
 ]


Alexander Binzberger commented on KAFKA-3042:
---------------------------------------------

I think this is the same as: KAFKA-1382 and KAFKA-2729
Maybe also: KAFKA-1407

seen this on 0.9.0.0 college told me he might have seen it on 0.8.2

for 0.9 it happened after high network load (possible network outage) (and 
possible slow disk IO).
my test cluster is running on virtual machines on a open stack system. the 
virtual machines disk IO may also go over the network for read/write.
At the moment I see this once or twice a day.
The broker is just not recovering from that state.
If this happens I can not produce to at least some partitions.
This affects also the offset partitions what means I can not consume some 
things any more.
Taking down the whole cluster and restarting it resolves the issue but is no 
option for a production system.

The ISR state in zookeeper and topics.sh --describe show all partitions as 
perfectly in service. But the metadata over kafka protocol tells a different 
story (matching the kafka log).

http://pastebin.com/6ekA5w3a

> updateIsr should stop after failed several times due to zkVersion issue
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-3042
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3042
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>         Environment: jdk 1.7
> centos 6.4
>            Reporter: Jiahongchao
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

Reply via email to