Hi David, Thanks for the KIP!
The KIP refers to "the KIP-500 bridge release (version 2.6.0 as of the time of this proposal)". This is out of date-- the bridge release will be one of the 3.x releases. We should either update this to 3.0, or perhaps just take out the reference to a specific version, since it's not necessary to understand the rest of the KIP. > ... and potentially could replace the existing controlled shutdown RPC. Since > this RPC > is somewhat generic, it could also be used to mark a replicas a "online" > following some > kind of log dir recovery procedure (out of scope for this proposal). I think it would be good to move this part into the "Future Work" section. > The Reason field is an optional textual description of why the event is being > sent Since we implemented optional fields in KIP-482, describing this field as "optional" might be confusing. Probably better to avoid describing it that way, unless it's a tagged field. > - If no Topic is given, it is implied that all topics on this broker are > being indicated > - If a Topic and no partitions are given, it is implied that all partitions > of this topic are being indicated I would prefer to leave out these "shortcuts" since they seem likely to lead to confusion and bugs. For example, suppose that the controller has just created a new partition for topic "foo" and put it on broker 3. But then, before broker 3 gets the LeaderAndIsrRequest from the controller, broker 3 get a bad log directory. So it sends an AlterReplicaStateRequest to the controller specifying topic foo and leaving out the partition list (using the first "shortcut".) The new partition will get marked as offline even though it hasn't even been created, much less assigned to the bad log directory. Since log directory failures are rare, spelling out the full set of affected partitions when one happens doesn't seem like that much of a burden. This is also consistent with what we currently do. In fact, it's much more efficient than what we currently do, since with KIP-589, we won't have to enumerate partitions that aren't on the failed log directory. In the future work section: If we eventually want to replace ControlledShutdownRequest with this RPC, we'll need some additional functionality. Specifically, we'll need the ability to tell the controller to stop putting new partitions on the broker that sent the request. That could be done with a separate request or possibly additional flags on this request. In any case, we don't have to solve that problem now. Thanks again for the KIP... great to see this moving forward. regards, Colin On Wed, May 20, 2020, at 12:22, David Arthur wrote: > Hello, all. I'd like to start the vote for KIP-589 which proposes to add a > new AlterReplicaState RPC. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller > > Cheers, > David >