[ https://issues.apache.org/jira/browse/KAFKA-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jose Armando Garcia Sancio reassigned KAFKA-9484: ------------------------------------------------- Assignee: Jose Armando Garcia Sancio (was: Jason Gustafson) > Unnecessary LeaderAndIsr update following reassignment completion > ----------------------------------------------------------------- > > Key: KAFKA-9484 > URL: https://issues.apache.org/jira/browse/KAFKA-9484 > Project: Kafka > Issue Type: Bug > Reporter: Jason Gustafson > Assignee: Jose Armando Garcia Sancio > Priority: Major > > Following the completion of the reassignment, the controller executes two > steps: first, it elects a new leader (if needed) and sends a LeaderAndIsr > update (in any case) with the new target replica set; second, it removes > unneeded replicas from the replica set and sends another round of > LeaderAndIsr updates. I am doubting the need for the first round of updates > in the case that the leader doesn't needed changing. > For example, suppose we have the following reassignment state: > replicas=[1,2,3,4], adding=[4], removing=[1], isr=[1,2,3,4], leader=2, > epoch=10 > First the controller will bump the epoch with the target replica set, which > will result in a round of to the target replica set with the following state: > replicas=[2,3,4], adding=[], removing=[], isr=[1,2,3,4], leader=2, epoch=11 > Immediately following this, the controller will bump the epoch again and > remove the unneeded replica. This will result in another round of > LeaderAndIsr requests with the following state: > replicas=[2,3,4], adding=[], removing=[], isr=[2,3,4], leader=2, epoch=12 > The first round of LeaderAndIsr updates puzzles me a bit. It is justified in > the code with this comment: > {code} > B3. Send a LeaderAndIsr request with RS = TRS. This will prevent the leader > from adding any replica in TRS - ORS back in the isr. > {code} > (I think the comment is backwards. It should be ORS (original replica set) - > TRS (target replica set).) > It sounds like we are trying to prevent a member of ORS from being added back > to the ISR, but even if it did get added, it would be removed in the next > step anyway. In the uncommon case that an ORS replica is out of sync, there > does not seem to be any benefit to this first update since it is basically > paying the cost of one write in order to save the speculative cost of one > write. Additionally, it would be useful if the protocol could enforce the > invariant that the ISR is always a subset of the replica set. -- This message was sent by Atlassian Jira (v8.3.4#803005)