Colin, thanks for the feedback. Good points. I've updated the KIP with your
suggestions.

-David

On Wed, May 27, 2020 at 4:05 PM Colin McCabe <cmcc...@apache.org> wrote:

> Hi David,
>
> Thanks for the KIP!
>
> The KIP refers to "the KIP-500 bridge release (version 2.6.0 as of the
> time of this proposal)".  This is out of date-- the bridge release will be
> one of the 3.x releases.  We should either update this to 3.0, or perhaps
> just take out the reference to a specific version, since it's not necessary
> to understand the rest of the KIP.
>
> > ... and potentially could replace the existing controlled shutdown RPC.
> Since this RPC
> > is somewhat generic, it could also be used to mark a replicas a "online"
> following some
> > kind of log dir recovery procedure (out of scope for this proposal).
>
> I think it would be good to move this part into the "Future Work" section.
>
> > The Reason field is an optional textual description of why the event is
> being sent
>
> Since we implemented optional fields in KIP-482, describing this field as
> "optional" might be confusing.  Probably better to avoid describing it that
> way, unless it's a tagged field.
>
> > - If no Topic is given, it is implied that all topics on this broker are
> being indicated
> > - If a Topic and no partitions are given, it is implied that all
> partitions of this topic are being indicated
>
> I would prefer to leave out these "shortcuts" since they seem likely to
> lead to confusion and bugs.
>
> For example, suppose that  the controller has just created a new partition
> for topic "foo" and put it on broker 3.  But then, before broker 3 gets the
> LeaderAndIsrRequest from the controller, broker 3 get a bad log directory.
> So it sends an AlterReplicaStateRequest to the controller specifying topic
> foo and leaving out the partition list (using the first "shortcut".)  The
> new partition will get marked as offline even though it hasn't even been
> created, much less assigned to the bad log directory.
>
> Since log directory failures are rare, spelling out the full set of
> affected partitions when one happens doesn't seem like that much of a
> burden.  This is also consistent with what we currently do.  In fact, it's
> much more efficient than what we currently do, since with KIP-589, we won't
> have to enumerate partitions that aren't on the failed log directory.
>
> In the future work section: If we eventually want to replace
> ControlledShutdownRequest with this RPC, we'll need some additional
> functionality.  Specifically, we'll need the ability to tell the controller
> to stop putting new partitions on the broker that sent the request.  That
> could be done with a separate request or possibly additional flags on this
> request.  In any case, we don't have to solve that problem now.
>
> Thanks again for the KIP... great to see this moving forward.
>
> regards,
> Colin
>
>
> On Wed, May 20, 2020, at 12:22, David Arthur wrote:
> > Hello, all. I'd like to start the vote for KIP-589 which proposes to add
> a
> > new AlterReplicaState RPC.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> >
> > Cheers,
> > David
> >
>


-- 
-David

Reply via email to