On Fri, May 10, 2019, at 16:43, Jason Gustafson wrote: > Hi Colin, > > I think storing reassignment state at the partition level is the right move > and I also agree that replicas should understand that there is a > reassignment in progress. This makes KIP-352 a trivial follow-up for > example. The only doubt I have is whether the leader and isr znode is the > right place to store the target reassignment. It is a bit odd to keep the > target assignment in a separate place from the current assignment, right? I > assume the thinking is probably that although the current assignment should > probably be in the leader and isr znode as well, it is hard to move the > state in a compatible way. Is that right? But if we have no plan to remove > the assignment znode, do you see a downside to storing the target > assignment there as well? >
Hi Jason, That's a good point -- it's probably better to keep the target assignment in the same znode as the current assignment, for consistency. I'll change the KIP. > A few additional questions: > > 1. Should `alterPartitionReassignments` be `alterPartitionAssignments`? > It's the current assignment we're altering, right? That's fair. AlterPartitionAssigments reads a little better, and I'll change it to that. > 2. Does this change affect the Metadata API? In other words, are clients > aware of reassignments? If so, then we probably need a change to > UpdateMetadata as well. The only alternative I can think of would be to > represent the replica set in the Metadata request as the union of the > current and target replicas, but I can't think of any benefit to hiding > reassignments. Note that if we did this, we probably wouldn't need a > separate API to list reassignments. I thought about this a bit... and I think on balance, you're right. We should keep this information together with the replica nodes, isr nodes, and offline replicas, and that information is available in the MetadataResponse. However, I do think in order to do this, we'll need a flag in the MetadataRequest that specifiies "only show me reassigning partitions". I'll add this. > 3. As replicas come into sync, they will join the ISR. Will we await all > target replicas joining the ISR before taking the replica out of the target > replicas set? Also, I assume that target replicas can still be elected as > leader? We'll take a replica out of the target replicas set as soon as that replica is in the ISR. Let me clarify this in the KIP. > 4. Probably useful to mention permissions for the new APIs. Good point. I think alterPartitionAssignments should require ALTER on CLUSTER. MetadataRequest permissions will be unchanged. best, Colin > > Thanks, > Jason > > On Fri, May 10, 2019 at 9:30 AM Gwen Shapira <g...@confluent.io> wrote: > > > +1 (binding) > > Looks great, and will be awesome to have this new capability. > > > > On Wed, May 8, 2019 at 10:23 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > Hi all, > > > > > > I'd like to start the vote for KIP-455: Create an Administrative API for > > > Replica Reassignment. I think this KIP is important since it will unlock > > > many follow-on improvements to Kafka reassignment (see the "Future work" > > > section, plus a lot of the other discussions we've had recently about > > > reassignment). It also furthers the important KIP-4 goal of removing > > > direct access to ZK. > > > > > > I made a few changes based on the discussion in the [DISCUSS] thread. As > > > Robert suggested, I removed the need to explicitly cancel a reassignment > > > for a partition before setting up a different reassignment for that > > > specific partition. I also simplified the API a bit by adding a > > > PartitionReassignment class which is used by both the alter and list > > APIs. > > > > > > I modified the proposal so that we now deprecate the old znode-based API > > > rather than removing it completely. That should give external > > rebalancing > > > tools some time to transition to the new API. > > > > > > To clarify a question Viktor asked, I added a note that the > > > kafka-reassign-partitions.sh will now use a --bootstrap-server argument > > to > > > contact the admin APIs. > > > > > > thanks, > > > Colin > > > > > > > > > -- > > *Gwen Shapira* > > Product Manager | Confluent > > 650.450.2760 | @gwenshap > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog > > <http://www.confluent.io/blog> > > >