I've updated the KIP with the feedback from this discussion https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller. I'll send out the vote thread shortly.
Thanks again, David On Tue, May 5, 2020 at 10:34 AM Tom Bentley <tbent...@redhat.com> wrote: > Hi Colin, > > Yeah, that makes sense, thanks. I was thinking, longer term, that there are > other benefits to having the log dir information available to the > controller. For example it would allow the possibility for CREATE_TOPIC > requests to include the intended log dir for each replica. But that's > obviously completely out of scope for this KIP. > > Kind regards, > > Tom > > On Mon, May 4, 2020 at 10:11 PM Colin McCabe <cmcc...@apache.org> wrote: > > > Hi Tom, > > > > As you said, the controller doesn't know about log directories, although > > individual brokers do. So the brokers do currently have to enumerate all > > the partitions that need to be removed to the controllers explicitly. So > > this KIP isn't changing anything in that regard. > > > > The current flow is: > > 1. ping ZK back-channel > > 2. controller sends a full LeaderAndIsrRequest to the broker > > 3. the broker sends a full response containing error codes for all > > partitions. Partitions on the failed storage have a nonzero error code; > > the others have 0. > > > > The new flow is: > > 1. the broker sends an RPC with all the failed partitions > > > > So the new flow actually substantially reduces the amount of network > > traffic, since previously we sent a full LeaderAndIsrRequest containing > all > > of the partitions. Now we just send all the partitions in the failed > > storage directory. That could still be a lot, but certainly only be a > > fraction of what a full LeaderAndIsrRequest would have. > > > > Sorry if I'm repeating stuff you already figured out, but I just wanted > to > > be more clear about this (I found it confusing too until David explained > it > > to me originally...) > > > > best, > > Colin > > > > > > On Sat, May 2, 2020, at 10:30, Tom Bentley wrote: > > > Hi David, > > > > > > > In the rejecting the alternative of having an RPC for log dir > failures > > > > you say > > > > > > > > I guess what I really mean here is that I wanted to avoid exposing > the > > > > notion of a log dir to the controller. I can update the description > to > > > > reflect this. > > > > > > > > > > Ah, I think I see now. While each broker knows about its log dirs this > > > isn't something that's stored in zookeeper or known to the controller. > > > > > > > > > > > It's also not completely clear that the cost of having to enumerate > > all > > > > the partitions on a log dir was weighed against the perceived benefit > > of a > > > > more flexible RPC. > > > > > > > > The enumeration isn't strictly required. In the "RPC semantics" > > section, I > > > > mention that if no topics are present in the RPC request, then all > > topics > > > > on the broker are implied. And if a topic is given with no > partitions, > > all > > > > partitions for that topic (on the broker) are implied. Does this make > > > > sense? > > > > > > > > > > So the no-topics-present optimisation wouldn't be available to a broker > > > with >1 log dirs where only some of the log dirs failed. I don't > suppose > > > that's a problem though. > > > > > > Thanks again, > > > > > > Tom > > > > > > > > > On Fri, May 1, 2020 at 5:48 PM David Arthur <mum...@gmail.com> wrote: > > > > > > > Jose/Colin/Tom, thanks for the feedback! > > > > > > > > > Partition level errors > > > > > > > > This was an oversight on my part, I meant to include these in the > > response > > > > RPC. I'll update that. > > > > > > > > > INVALID_REQUEST > > > > > > > > I'll update this text description, that was a copy/paste left over > > > > > > > > > I think we should mention that the controller will keep it's > current > > > > implementation of marking the replicas as offline because of failure > > in the > > > > LeaderAndIsr response. > > > > > > > > Good suggestions, I'll add that. > > > > > > > > > Does EventType need to be an Int32? > > > > > > > > No, it doesn't. I'll update to Int8. Do we have an example of the > enum > > > > paradigm in our RPC today? I'm curious if we actually map it to a > real > > Java > > > > enum in the AbstractRequest/Response classes. > > > > > > > > > AlterReplicaStates > > > > > > > > Sounds good to me. > > > > > > > > > In the rejecting the alternative of having an RPC for log dir > > failures > > > > you say > > > > > > > > I guess what I really mean here is that I wanted to avoid exposing > the > > > > notion of a log dir to the controller. I can update the description > to > > > > reflect this. > > > > > > > > > It's also not completely clear that the cost of having to enumerate > > all > > > > the partitions on a log dir was weighed against the perceived benefit > > of a > > > > more flexible RPC. > > > > > > > > The enumeration isn't strictly required. In the "RPC semantics" > > section, I > > > > mention that if no topics are present in the RPC request, then all > > topics > > > > on the broker are implied. And if a topic is given with no > partitions, > > all > > > > partitions for that topic (on the broker) are implied. Does this make > > > > sense? > > > > > > > > Thanks again! I'll update the KIP and leave a message here once it's > > > > revised. > > > > > > > > David > > > > > > > > On Wed, Apr 29, 2020 at 11:20 AM Tom Bentley <tbent...@redhat.com> > > wrote: > > > > > > > > > Hi David, > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > In the rejecting the alternative of having an RPC for log dir > > failures > > > > you > > > > > say: > > > > > > > > > > It was also rejected to prevent "leaking" the notion of a log dir > to > > the > > > > > > public API. > > > > > > > > > > > > > > > > I'm not quite sure I follow that argument, since we already have > > RPCs for > > > > > changing replica log dirs. So in a general sense log dirs already > > exist > > > > in > > > > > the API. I suspect you were using public API to mean something more > > > > > specific; could you elaborate? > > > > > > > > > > It's also not completely clear that the cost of having to enumerate > > all > > > > the > > > > > partitions on a log dir was weighed against the perceived benefit > of > > a > > > > more > > > > > flexible RPC. (I'm sure it was, but it would be good to say so). > > > > > > > > > > Many thanks, > > > > > > > > > > Tom > > > > > > > > > > On Wed, Apr 29, 2020 at 12:04 AM Colin McCabe <cmcc...@apache.org> > > > > wrote: > > > > > > > > > > > Hi David, > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > > > I think the ReplicaStateEventResponse should have a separate > error > > code > > > > > > for each partition. > > > > > > Currently it just has one error code for the whole > > request/response, > > > > if > > > > > > I'm reading this right. I think Jose made a similar point as > > well. We > > > > > > should plan for scenarios where some replica states can be > changed > > and > > > > > some > > > > > > can't. > > > > > > > > > > > > Does EventType need to be an Int32? For enums, we usually use > the > > > > > > smallest reasonable type, which would be Int8 here. We can > always > > > > change > > > > > > the schema later if needed. UNKNOWN_REPLICA_EVENT_TYPE seems > > > > unnecessary > > > > > > since INVALID_REQUEST covers this case. > > > > > > > > > > > > I'd also suggest "AlterReplicaStates[Request,Response]" as a > > slightly > > > > > > better name for this RPC. > > > > > > > > > > > > cheers, > > > > > > Colin > > > > > > > > > > > > > > > > > > On Tue, Apr 7, 2020, at 12:43, David Arthur wrote: > > > > > > > Hey everyone, > > > > > > > > > > > > > > I'd like to start the discussion for KIP-589, part of the > KIP-500 > > > > > effort > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller > > > > > > > > > > > > > > This is a proposal to use a new RPC instead of ZooKeeper for > > > > notifying > > > > > > the > > > > > > > controller of an offline replica. Please give a read and let me > > know > > > > > your > > > > > > > thoughts. > > > > > > > > > > > > > > Thanks! > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > David Arthur > > > > > > > > > > > > -- David Arthur