[ 
https://issues.apache.org/jira/browse/KAFKA-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio reassigned KAFKA-9484:
-------------------------------------------------

    Assignee: Jose Armando Garcia Sancio  (was: Jason Gustafson)

> Unnecessary LeaderAndIsr update following reassignment completion
> -----------------------------------------------------------------
>
>                 Key: KAFKA-9484
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9484
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Jose Armando Garcia Sancio
>            Priority: Major
>
> Following the completion of the reassignment, the controller executes two 
> steps: first, it elects a new leader (if needed) and sends a LeaderAndIsr 
> update (in any case) with the new target replica set; second, it removes 
> unneeded replicas from the replica set and sends another round of 
> LeaderAndIsr updates. I am doubting the need for the first round of updates 
> in the case that the leader doesn't needed changing. 
> For example, suppose we have the following reassignment state: 
> replicas=[1,2,3,4], adding=[4], removing=[1], isr=[1,2,3,4], leader=2, 
> epoch=10
> First the controller will bump the epoch with the target replica set, which 
> will result in a round of to the target replica set with the following state: 
> replicas=[2,3,4], adding=[], removing=[], isr=[1,2,3,4], leader=2, epoch=11 
> Immediately following this, the controller will bump the epoch again and 
> remove the unneeded replica. This will result in another round of 
> LeaderAndIsr requests with the following state: 
> replicas=[2,3,4], adding=[], removing=[], isr=[2,3,4], leader=2, epoch=12 
> The first round of LeaderAndIsr updates puzzles me a bit. It is justified in 
> the code with this comment: 
> {code} 
> B3. Send a LeaderAndIsr request with RS = TRS. This will prevent the leader 
> from adding any replica in TRS - ORS back in the isr. 
> {code} 
> (I think the comment is backwards. It should be ORS (original replica set) - 
> TRS (target replica set).) 
> It sounds like we are trying to prevent a member of ORS from being added back 
> to the ISR, but even if it did get added, it would be removed in the next 
> step anyway. In the uncommon case that an ORS replica is out of sync, there 
> does not seem to be any benefit to this first update since it is basically 
> paying the cost of one write in order to save the speculative cost of one 
> write. Additionally, it would be useful if the protocol could enforce the 
> invariant that the ISR is always a subset of the replica set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to