[ 
https://issues.apache.org/jira/browse/KAFKA-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105612#comment-15105612
 ] 

Flavio Junqueira commented on KAFKA-3083:
-----------------------------------------

[~mgharat]

bq. I was just thinking if we can modify the controller code to always check if 
it is the controller before it makes such changes to zookeeper.

In principle, there is the race that [~junrao] mentioned, but I was thinking 
that one possibility would be use a multi-op that combines the update to the 
ISR and a znode check. The znode check verifies that the version of the 
controller leadership znode is still the same and if it passes, then the ISR 
data is updated. Using the scenario in the description to illustrate, when 
broker A tries to update the ISR state in ZK in step 3, the operation fails 
because the version of the controller leadership znode has changed.

The solution of handling the connection loss event is typical, but we could 
consider adding a multi-op to be extra safe against these spurious writes. 

> a soft failure in controller may leave a topic partition in an inconsistent 
> state
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-3083
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3083
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.9.0.0
>            Reporter: Jun Rao
>            Assignee: Mayuresh Gharat
>
> The following sequence can happen.
> 1. Broker A is the controller and is in the middle of processing a broker 
> change event. As part of this process, let's say it's about to shrink the isr 
> of a partition.
> 2. Then broker A's session expires and broker B takes over as the new 
> controller. Broker B sends the initial leaderAndIsr request to all brokers.
> 3. Broker A continues by shrinking the isr of the partition in ZK and sends 
> the new leaderAndIsr request to the broker (say C) that leads the partition. 
> Broker C will reject this leaderAndIsr since the request comes from a 
> controller with an older epoch. Now we could be in a situation that Broker C 
> thinks the isr has all replicas, but the isr stored in ZK is different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to