[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

Jun Rao (JIRA) Tue, 02 Aug 2016 15:11:55 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404901#comment-15404901
 ]


Jun Rao commented on KAFKA-3042:
--------------------------------

[~onurkaraman], there are a couple of things.

1. Currently when a broker starts up, it expects the very first 
LeaderAndIsrRequest to contain all the partitions hosted on this broker. After 
that, we read the last checkpointed high watermark and start the high watermark 
checkpoint thread. If we combine UpdateMetadataRequest and LeaderAndIsrRequest, 
the very first request that a broker receives could be an UpdateMetadataRequest 
including partitions not hosted on this broker. Then, we may checkpoint high 
watermarks on incorrect partitions.

2. Currently, LeaderAndIsrRequest is used to inform replicas about the new 
leader and is only sent to brokers storing the partition. UpdateMetadataRequest 
is used for updating the metadata cache for the clients and is sent to every 
broker. Technically, they are for different things. So, using separate requests 
makes logical sense. We could use a single request to do both. Not sure if this 
makes it clearer or more confusing from a debugging perspective. In any case, 
there will be significant code changes to do this. I am not opposed to that. I 
just think that if we want to do that, we probably want to think through how to 
improve the controller logic holistically since there are other known pain 
points in the controller.


> updateIsr should stop after failed several times due to zkVersion issue
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-3042
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3042
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>         Environment: jdk 1.7
> centos 6.4
>            Reporter: Jiahongchao
>             Fix For: 0.10.1.0
>
>         Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

Reply via email to