----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14865/#review27459 -----------------------------------------------------------
Thanks for the patch. There a couple of issues. 1. The main one is during the phase of partition reassignment when we bootstrap new replicas. At this point, the assigned replica list doesn't include the new replicas. If we only allow replicas in assigned replica set to be added to ISR, those new replicas won't be added to ISR, which will prevent partition reassignment from completing. We could include those new replicas in the all replica set in the LeaderAndIsr request. We probably have to think a bit more to see if there is any other impact. 2. Once the assigned replica set in the broker is updated. We need to prevent an old replica from being added back to this set again. Currently, in Partition.updateLeaderHWAndMaybeExpandIsr() (triggered by a fetch request), it will call getOrCreateReplica(), which can cause a replica to be added back to the assigned replica set. What we can do is to only call getOrCreateReplica during makeLeader() and makeFollower(). In the former, we force all replicas to be created. In the latter, we just need to make sure the local replica is created. In Partition.updateLeaderHWAndMaybeExpandIsr(), we can then use getReplica(), instead of getOrCreateReplica(). - Jun Rao On Oct. 23, 2013, 4:34 a.m., Neha Narkhede wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/14865/ > ----------------------------------------------------------- > > (Updated Oct. 23, 2013, 4:34 a.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1097 > https://issues.apache.org/jira/browse/KAFKA-1097 > > > Repository: kafka > > > Description > ------- > > KAFKA-1097 Race condition while reassigning low throughput partition leads to > incorrect ISR information in zookeeper; The changes include 1) Adding the ISR > shrink logic as part of the OfflineReplica -> NonExistentReplica state change > 2) Adding a safety check on the broker where it only expands the ISR if the > replica is in the assigned replica list 3) Updating the assigned replica list > on the broker on every makeLeader request and also on makeFollower request > for safety, though that's not strictly required. These changes will ensure > that the ISR is shrunk by the controller and the leader has an updated > assigned replica list. So even if a replica sends a fetch request after the > ISR is shrunk by the controller, the broker will not be able to update the > ISR until it receives the next LeaderAndIsrRequest (which notifies it of the > latest zkVersion of the partition state path) that also contains the shrunk > ISR and assigned replica list. Using that the broker will avoid expanding the > ISR i f the replica is not present in the new assigned replica list > > > Diffs > ----- > > core/src/main/scala/kafka/cluster/Partition.scala > 60f3ed4e88b08d0dcf2ea84259ae6dbe9d7c3a2d > core/src/main/scala/kafka/controller/KafkaController.scala > 88d130f55997b72a8590e4cfe92857a7320e70d5 > core/src/main/scala/kafka/controller/ReplicaStateMachine.scala > 212c05d65dcdc147e55f90875bacc940e30342bf > core/src/main/scala/kafka/server/ReplicaManager.scala > 03ba60e82cdb3dce100603d615894ede47e4b077 > kafka-patch-review.py 82ea9a890fe79aad7d0ea6d33f3e2780e036317c > > Diff: https://reviews.apache.org/r/14865/diff/ > > > Testing > ------- > > > Thanks, > > Neha Narkhede > >