Hi Stanislav, Thanks for the KIP. I think this is a nice solution to the problem of not wanting to change the replication factor during reassignments.
Just from a writing point of view, it would be nice for the first paragraph to be a bit more explicit about this goal. Maybe lead with "Many times, we don't want to change the replication factor of a partition during reassignment..." As it is, we're talking about metadata races before we've even explained what the goal is that the metadata races are thwarting. :) I like the RPC and command-line format changes; they are well-done and well-written. One thing we do need to spell out, those, is what the behavior is when the server does not support this new option. The simplest thing to do would be for the client to throw UnsupportedVersionException with an exception message indicating what the problem is. Then the caller could catch this and re-try the call without the flag (or give up, as appropriate?) The other option is to continue on but not actually protect replication factor. If we do this, at minimum we'd need to rename the flag something like "try to protect replication factor" to make it clear that it's best-effort. It's sort of debatable which way is better. In principle the UVE sounds nicer, but in practice maybe the other behavior is best? I suspect most systems would turn around and retry without the flag in the event of a UVE... best, Colin On Thu, Aug 4, 2022, at 13:37, Vikas Singh wrote: > Thanks Stanislav for the KIP. Seems like a reasonable proposal, > preventing users from accidentally altering the replica set under certain > conditions. I have couple of comments: > > >> In the case of an already-reassigning partition being reassigned again, > the validation compares the targetReplicaSet size of the reassignment to > the targetReplicaSet size of the new reassignment and throws if those > differ. > Can you add more detail to this, or clarify what is targetReplicaSet (for > e.g. why not sourceReplicaSet?) and how the target replica set will be > calculated? > > And what about the reassign partitions CLI? Do we want to expose the option > there too? > > Cheers, > Vikas > > On Thu, Jul 28, 2022 at 1:59 AM Stanislav Kozlovski <stanis...@confluent.io> > wrote: > >> Hey all, >> >> I'd like to start a discussion on a proposal to help API users from >> inadvertently increasing the replication factor of a topic through >> the alter partition reassignments API. The KIP describes two fairly >> easy-to-hit race conditions in which this can happen. >> >> The KIP itself is pretty simple, yet has a couple of alternatives that can >> help solve the same problem. I would appreciate thoughts from the community >> on how you think we should proceed, and whether the proposal makes sense in >> the first place. >> >> Thanks! >> >> KIP: >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-860%3A+Add+client-provided+option+to+guard+against+replication+factor+change+during+partition+reassignments >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14121 >> >> -- >> Best, >> Stanislav >>