[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

Flavio Junqueira (JIRA) Tue, 19 Apr 2016 07:12:44 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247802#comment-15247802
 ]


Flavio Junqueira commented on KAFKA-3042:
-----------------------------------------

[~junrao] It makes sense, thanks for the analysis. Trying to reconstruct the 
problem in steps, this is what's going on:

# Broker 5 thinks broker 4 is alive and sends a LeaderAndIsr request to broker 
1 with 4 as the leader.
# Broker 1 doesn't have 4 cached as a live broker, so it fails the request to 
make it a follower of the partition.

The LeaderAndIsr request has a list of live leaders, and I suppose 4 is in that 
list. 

To sort this out, I can see two options:

# We simply update the metadata cache upon receiving a LeaderAndIsr request 
using the list of live leaders. This update needs to be  the union of the 
current set with the set of leaders.
# You also suggested to send an UpdateMetadata request first to update the set 
of love brokers. 

I can't see any problem with 1, and I can't see any immediate problem with 2 
either, but I'm concerned about finding ourselves with another race condition 
if we send an update first. What do you think?  

> updateIsr should stop after failed several times due to zkVersion issue
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-3042
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3042
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>         Environment: jdk 1.7
> centos 6.4
>            Reporter: Jiahongchao
>             Fix For: 0.10.0.0
>
>         Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

Reply via email to