Hi Colin, I think storing reassignment state at the partition level is the right move and I also agree that replicas should understand that there is a reassignment in progress. This makes KIP-352 a trivial follow-up for example. The only doubt I have is whether the leader and isr znode is the right place to store the target reassignment. It is a bit odd to keep the target assignment in a separate place from the current assignment, right? I assume the thinking is probably that although the current assignment should probably be in the leader and isr znode as well, it is hard to move the state in a compatible way. Is that right? But if we have no plan to remove the assignment znode, do you see a downside to storing the target assignment there as well?
A few additional questions: 1. Should `alterPartitionReassignments` be `alterPartitionAssignments`? It's the current assignment we're altering, right? 2. Does this change affect the Metadata API? In other words, are clients aware of reassignments? If so, then we probably need a change to UpdateMetadata as well. The only alternative I can think of would be to represent the replica set in the Metadata request as the union of the current and target replicas, but I can't think of any benefit to hiding reassignments. Note that if we did this, we probably wouldn't need a separate API to list reassignments. 3. As replicas come into sync, they will join the ISR. Will we await all target replicas joining the ISR before taking the replica out of the target replicas set? Also, I assume that target replicas can still be elected as leader? 4. Probably useful to mention permissions for the new APIs. Thanks, Jason On Fri, May 10, 2019 at 9:30 AM Gwen Shapira <g...@confluent.io> wrote: > +1 (binding) > Looks great, and will be awesome to have this new capability. > > On Wed, May 8, 2019 at 10:23 PM Colin McCabe <cmcc...@apache.org> wrote: > > > Hi all, > > > > I'd like to start the vote for KIP-455: Create an Administrative API for > > Replica Reassignment. I think this KIP is important since it will unlock > > many follow-on improvements to Kafka reassignment (see the "Future work" > > section, plus a lot of the other discussions we've had recently about > > reassignment). It also furthers the important KIP-4 goal of removing > > direct access to ZK. > > > > I made a few changes based on the discussion in the [DISCUSS] thread. As > > Robert suggested, I removed the need to explicitly cancel a reassignment > > for a partition before setting up a different reassignment for that > > specific partition. I also simplified the API a bit by adding a > > PartitionReassignment class which is used by both the alter and list > APIs. > > > > I modified the proposal so that we now deprecate the old znode-based API > > rather than removing it completely. That should give external > rebalancing > > tools some time to transition to the new API. > > > > To clarify a question Viktor asked, I added a note that the > > kafka-reassign-partitions.sh will now use a --bootstrap-server argument > to > > contact the admin APIs. > > > > thanks, > > Colin > > > > > -- > *Gwen Shapira* > Product Manager | Confluent > 650.450.2760 | @gwenshap > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog > <http://www.confluent.io/blog> >