[jira] [Comment Edited] (KAFKA-1120) Controller could miss a broker state change

Jun Rao (JIRA) Tue, 22 Nov 2016 17:04:27 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325496#comment-15325496
 ]


Jun Rao edited comment on KAFKA-1120 at 11/23/16 1:03 AM:
----------------------------------------------------------

A better way is probably for the controller to store the czxid (which is 
guaranteed to be unique and monotonically increasing) of the broker 
registration path. When a ZK watcher is fired, the controller can read the 
current czxid of each of the broker registration and see if it has changed. If 
so, the controller will treat the broker as it has failed and then restarted.


was (Author: junrao):
A better way is probably for the controller to store the ZK version of the 
broker registration path. When a ZK watcher is fired, the controller can read 
the current ZK version of each of the broker registration and see if it has 
changed. If so, the controller will treat the broker as it has failed and then 
restarted.

> Controller could miss a broker state change 
> --------------------------------------------
>
>                 Key: KAFKA-1120
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1120
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Jun Rao
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-1120) Controller could miss a broker state change

Reply via email to