[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237604#comment-15237604
 ] 

James Cheng edited comment on KAFKA-3042 at 4/12/16 5:37 PM:
-------------------------------------------------------------

Thanks [~fpj]. Do you need any additional info from us? I don't think we have 
any other logs, but let us know if you have any questions.

About your findings:
>From your comment at 
>https://issues.apache.org/jira/browse/KAFKA-3042?focusedCommentId=15236055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15236055,
> you said that broker 3 failed to release leadership to broker 4 because 
>broker 4 was offline:
{noformat}
[2016-04-09 00:40:58,144] ERROR Broker 1 received LeaderAndIsrRequest with 
correlation id 0 from controller 5 epoch 415 for partition 
[tec1.en2.frontend.syncPing,7] but cannot become follower since the new leader 
4 is unavailable. (state.change.logger)
{noformat}
What is the correct behavior for that scenario? Should broker 3 continue 
leadership? Or should it give up leadership completely until a controller comes 
back and tells it who is the new leader? Does broker 3 send back a response (or 
error) to the controller saying that it was unable to accept that change?

What happens in this scenario?
1) Broker 1 is leader of a partition.
2) Controller sends a LeaderAndIsrRequest to brokers 1 and 2 and 3, saying that 
broker 4 is the new leader.
3) Brokers 2 and 3 receives the LeaderAndIsrRequest and accepts the change.
4) LeaderAndIsrRequest is delayed due to network latency enroute to broker 1.

During this delay, won't different brokers have different ideas of who the 
leader is? Broker 1 thinks it is leader. Brokers 2 3 4 5 think that broker 4 is 
the leader. Or did I miss something?





was (Author: wushujames):
Thanks [~fpj]. Do you need any additional info from us? I don't think we have 
any other logs, but let us know if you have any questions.

About your findings:
>From your comment at https://issues.apache.org/jira/browse/KAFKA-3042, you 
>said that broker 3 failed to release leadership to broker 4 because broker 4 
>was offline:
{noformat}
[2016-04-09 00:40:58,144] ERROR Broker 1 received LeaderAndIsrRequest with 
correlation id 0 from controller 5 epoch 415 for partition 
[tec1.en2.frontend.syncPing,7] but cannot become follower since the new leader 
4 is unavailable. (state.change.logger)
{noformat}
What is the correct behavior for that scenario? Should broker 3 continue 
leadership? Or should it give up leadership completely until a controller comes 
back and tells it who is the new leader? Does broker 3 send back a response (or 
error) to the controller saying that it was unable to accept that change?

What happens in this scenario?
1) Broker 1 is leader of a partition.
2) Controller sends a LeaderAndIsrRequest to brokers 1 and 2 and 3, saying that 
broker 4 is the new leader.
3) Brokers 2 and 3 receives the LeaderAndIsrRequest and accepts the change.
4) LeaderAndIsrRequest is delayed due to network latency enroute to broker 1.

During this delay, won't different brokers have different ideas of who the 
leader is? Broker 1 thinks it is leader. Brokers 2 3 4 5 think that broker 4 is 
the leader. Or did I miss something?




> updateIsr should stop after failed several times due to zkVersion issue
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-3042
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3042
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>         Environment: jdk 1.7
> centos 6.4
>            Reporter: Jiahongchao
>         Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to