Re: [DISCUSS] KIP-589 Add API to Update Replica State in Controller

Tom Bentley Tue, 05 May 2020 07:35:35 -0700

Hi Colin,

Yeah, that makes sense, thanks. I was thinking, longer term, that there are
other benefits to having the log dir information available to the
controller. For example it would allow the possibility for CREATE_TOPIC
requests to include the intended log dir for each replica. But that's
obviously completely out of scope for this KIP.


Kind regards,

Tom

On Mon, May 4, 2020 at 10:11 PM Colin McCabe <cmcc...@apache.org> wrote:

> Hi Tom,
>
> As you said, the controller doesn't know about log directories, although
> individual brokers do.  So the brokers do currently have to enumerate all
> the partitions that need to be removed to the controllers explicitly.  So
> this KIP isn't changing anything in that regard.
>
> The current flow is:
> 1. ping ZK back-channel
> 2. controller sends a full LeaderAndIsrRequest to the broker
> 3. the broker sends a full response containing error codes for all
> partitions.  Partitions on the failed storage have a nonzero error code;
> the others have 0.
>
> The new flow is:
> 1. the broker sends an RPC with all the failed partitions
>
> So the new flow actually substantially reduces the amount of network
> traffic, since previously we sent a full LeaderAndIsrRequest containing all
> of the partitions.  Now we just send all the partitions in the failed
> storage directory.  That could still be a lot, but certainly only be a
> fraction of what a full LeaderAndIsrRequest would have.
>
> Sorry if I'm repeating stuff you already figured out, but I just wanted to
> be more clear about this (I found it confusing too until David explained it
> to me originally...)
>
> best,
> Colin
>
>
> On Sat, May 2, 2020, at 10:30, Tom Bentley wrote:
> > Hi David,
> >
> > > In the rejecting the alternative of having an RPC for log dir failures
> > > you say
> > >
> > > I guess what I really mean here is that I wanted to avoid exposing the
> > > notion of a log dir to the controller. I can update the description to
> > > reflect this.
> > >
> >
> > Ah, I think I see now. While each broker knows about its log dirs this
> > isn't something that's stored in zookeeper or known to the controller.
> >
> >
> > > > It's also not completely clear that the cost of having to enumerate
> all
> > > the partitions on a log dir was weighed against the perceived benefit
> of a
> > > more flexible RPC.
> > >
> > > The enumeration isn't strictly required. In the "RPC semantics"
> section, I
> > > mention that if no topics are present in the RPC request, then all
> topics
> > > on the broker are implied. And if a topic is given with no partitions,
> all
> > > partitions for that topic (on the broker) are implied. Does this make
> > > sense?
> > >
> >
> > So the no-topics-present optimisation wouldn't be available to a broker
> > with >1 log dirs where only some of the log dirs failed. I don't suppose
> > that's a problem though.
> >
> > Thanks again,
> >
> > Tom
> >
> >
> > On Fri, May 1, 2020 at 5:48 PM David Arthur <mum...@gmail.com> wrote:
> >
> > > Jose/Colin/Tom, thanks for the feedback!
> > >
> > > > Partition level errors
> > >
> > > This was an oversight on my part, I meant to include these in the
> response
> > > RPC. I'll update that.
> > >
> > > > INVALID_REQUEST
> > >
> > > I'll update this text description, that was a copy/paste left over
> > >
> > > > I think we should mention that the controller will keep it's current
> > > implementation of marking the replicas as offline because of failure
> in the
> > > LeaderAndIsr response.
> > >
> > > Good suggestions, I'll add that.
> > >
> > > > Does EventType need to be an Int32?
> > >
> > > No, it doesn't. I'll update to Int8. Do we have an example of the enum
> > > paradigm in our RPC today? I'm curious if we actually map it to a real
> Java
> > > enum in the AbstractRequest/Response classes.
> > >
> > > > AlterReplicaStates
> > >
> > > Sounds good to me.
> > >
> > > > In the rejecting the alternative of having an RPC for log dir
> failures
> > > you say
> > >
> > > I guess what I really mean here is that I wanted to avoid exposing the
> > > notion of a log dir to the controller. I can update the description to
> > > reflect this.
> > >
> > > > It's also not completely clear that the cost of having to enumerate
> all
> > > the partitions on a log dir was weighed against the perceived benefit
> of a
> > > more flexible RPC.
> > >
> > > The enumeration isn't strictly required. In the "RPC semantics"
> section, I
> > > mention that if no topics are present in the RPC request, then all
> topics
> > > on the broker are implied. And if a topic is given with no partitions,
> all
> > > partitions for that topic (on the broker) are implied. Does this make
> > > sense?
> > >
> > > Thanks again! I'll update the KIP and leave a message here once it's
> > > revised.
> > >
> > > David
> > >
> > > On Wed, Apr 29, 2020 at 11:20 AM Tom Bentley <tbent...@redhat.com>
> wrote:
> > >
> > > > Hi David,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > In the rejecting the alternative of having an RPC for log dir
> failures
> > > you
> > > > say:
> > > >
> > > > It was also rejected to prevent "leaking" the notion of a log dir to
> the
> > > > > public API.
> > > > >
> > > >
> > > > I'm not quite sure I follow that argument, since we already have
> RPCs for
> > > > changing replica log dirs. So in a general sense log dirs already
> exist
> > > in
> > > > the API. I suspect you were using public API to mean something more
> > > > specific; could you elaborate?
> > > >
> > > > It's also not completely clear that the cost of having to enumerate
> all
> > > the
> > > > partitions on a log dir was weighed against the perceived benefit of
> a
> > > more
> > > > flexible RPC. (I'm sure it was, but it would be good to say so).
> > > >
> > > > Many thanks,
> > > >
> > > > Tom
> > > >
> > > > On Wed, Apr 29, 2020 at 12:04 AM Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > I think the ReplicaStateEventResponse should have a separate error
> code
> > > > > for each partition.
> > > > >  Currently it just has one error code for the whole
> request/response,
> > > if
> > > > > I'm reading this right.  I think Jose made a similar point as
> well.  We
> > > > > should plan for scenarios where some replica states can be changed
> and
> > > > some
> > > > > can't.
> > > > >
> > > > > Does EventType need to be an Int32?  For enums, we usually use the
> > > > > smallest reasonable type, which would be Int8 here.  We can always
> > > change
> > > > > the schema later if needed.  UNKNOWN_REPLICA_EVENT_TYPE seems
> > > unnecessary
> > > > > since INVALID_REQUEST covers this case.
> > > > >
> > > > > I'd also suggest "AlterReplicaStates[Request,Response]" as a
> slightly
> > > > > better name for this RPC.
> > > > >
> > > > > cheers,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, Apr 7, 2020, at 12:43, David Arthur wrote:
> > > > > > Hey everyone,
> > > > > >
> > > > > > I'd like to start the discussion for KIP-589, part of the KIP-500
> > > > effort
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > >
> > > > > > This is a proposal to use a new RPC instead of ZooKeeper for
> > > notifying
> > > > > the
> > > > > > controller of an offline replica. Please give a read and let me
> know
> > > > your
> > > > > > thoughts.
> > > > > >
> > > > > > Thanks!
> > > > > > David
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > David Arthur
> > >
> >
>
>

Re: [DISCUSS] KIP-589 Add API to Update Replica State in Controller

Reply via email to