Hi David,

Thanks for the KIP!

The KIP refers to "the KIP-500 bridge release (version 2.6.0 as of the time of 
this proposal)".  This is out of date-- the bridge release will be one of the 
3.x releases.  We should either update this to 3.0, or perhaps  just take out 
the reference to a specific version, since it's not necessary to understand the 
rest of the KIP.

> ... and potentially could replace the existing controlled shutdown RPC. Since 
> this RPC 
> is somewhat generic, it could also be used to mark a replicas a "online" 
> following some 
> kind of log dir recovery procedure (out of scope for this proposal).

I think it would be good to move this part into the "Future Work" section.

> The Reason field is an optional textual description of why the event is being 
> sent

Since we implemented optional fields in KIP-482, describing this field as 
"optional" might be confusing.  Probably better to avoid describing it that 
way, unless it's a tagged field.

> - If no Topic is given, it is implied that all topics on this broker are 
> being indicated
> - If a Topic and no partitions are given, it is implied that all partitions 
> of this topic are being indicated

I would prefer to leave out these "shortcuts" since they seem likely to lead to 
confusion and bugs.

For example, suppose that  the controller has just created a new partition for 
topic "foo" and put it on broker 3.  But then, before broker 3 gets the 
LeaderAndIsrRequest from the controller, broker 3 get a bad log directory.  So 
it sends an AlterReplicaStateRequest to the controller specifying topic foo and 
leaving out the partition list (using the first "shortcut".)  The new partition 
will get marked as offline even though it hasn't even been created, much less 
assigned to the bad log directory.

Since log directory failures are rare, spelling out the full set of affected 
partitions when one happens doesn't seem like that much of a burden.  This is 
also consistent with what we currently do.  In fact, it's much more efficient 
than what we currently do, since with KIP-589, we won't have to enumerate 
partitions that aren't on the failed log directory.

In the future work section: If we eventually want to replace 
ControlledShutdownRequest with this RPC, we'll need some additional 
functionality.  Specifically, we'll need the ability to tell the controller to 
stop putting new partitions on the broker that sent the request.  That could be 
done with a separate request or possibly additional flags on this request.  In 
any case, we don't have to solve that problem now.

Thanks again for the KIP... great to see this moving forward.

regards,
Colin


On Wed, May 20, 2020, at 12:22, David Arthur wrote:
> Hello, all. I'd like to start the vote for KIP-589 which proposes to add a
> new AlterReplicaState RPC.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> 
> Cheers,
> David
>

Reply via email to