Hi Colin, The updates look good to me. I wanted to clarify a few details about the AUTHORIZER_NOT_READY error code:
1. Only the Authorizer implementation will know when it is finished initialization. I had imagined that any request received in this state which cannot be authorized based on static configuration (i.e super.users) would raise an AuthorizerNotReadyException (or something like that). Is that what you had in mind? 2. Do we need any version bumps of protocols to indicate compatibility with this error code? I think the main situation where we might see it is `Envelope` requests. 3. In the context of forwarding, I believe the plan was to not return AUTHORIZER_NOT_READY to clients. Instead, if the broker sees this error, it will backoff and retry the request. Is that right? Thanks, Jason On Mon, Jan 31, 2022 at 3:02 PM Colin McCabe <[email protected]> wrote: > Hi all, > > Thanks, everyone. I've updated the KIP. > > I reorganized the design section to explain bootstrapping a bit better, > and put it first. (It was always weird that the design section came after > the interface changes section :) ). I also added early.start.listeners and > a section on the other configs (which are the same as in AclAuthorizer). > > best, > Colin > > On Thu, Jan 27, 2022, at 16:38, Jason Gustafson wrote: > >> My intention here was to avoid exposing Kafka clients to > > AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to > internal > > components (such as the broker) then things are a lot simpler. Keep in > mind > > there are external management systems that assume that if the broker > ports > > accept traffic, the broker is "up". For example, Kubernetes operators, > > external dashboards and health checks, etc. > > > > What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be > > returned in Envelope responses (and only if the client principal is not a > > super user). The broker would retry the operation until the request > timeout > > after seeing this error, which is similar to how we handle controller > > unavailability in general. So the error is not seen by client requests. > It > > sounds like what you are suggesting instead is having a dedicated > listener > > for inter-cluster traffic? Can you clarify the rules that we use to > decide > > which listener to send forwarded requests to? > > > >> We've discussed this issue in the past in the context of the broker. > > Should we fence if the broker can't apply metadata for a certain amount > of > > time? It's controversial because fencing may add more load to an already > > loaded cluster. However, without new metadata, we won't get new ACL > > changes, learn about new topics, etc. etc. > > > > Yep, I appreciate the concern. It is a bit different in the context of > the > > controller though. Without having access to the quorum, there is little a > > controller can do anyway. That is why we filed this issue: > > https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it > > would be desirable if the same approach could be used by both the broker > > and controller. > > > > -Jason > > > > > > On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <[email protected]> wrote: > > > >> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote: > >> > Hey Colin, > >> > > >> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, > then > >> > would we need `early.start.listeners`? This would give the > `Authorizer` a > >> > way to support partial startup, but it would be up to the > implementation > >> to > >> > decide whether to use it. For an `Authorizer` which depends only on an > >> > external service, there is no dependence on the metadata log, so > there is > >> > no reason to have partial startup. For the `StandardAuthorizer`, we > could > >> > let it return a completed future from `Authorizer.start` for all of > the > >> > controller listeners. Then we can let it authorize only super.user > >> > operations until the metadata log has been loaded (and raise > >> > AuthorizerNotReadyException for everything else until that time). > >> > > >> > >> Hi Jason, > >> > >> My intention here was to avoid exposing Kafka clients to > >> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to > internal > >> components (such as the broker) then things are a lot simpler. Keep in > mind > >> there are external management systems that assume that if the broker > ports > >> accept traffic, the broker is "up". For example, Kubernetes operators, > >> external dashboards and health checks, etc. > >> > >> We can continue to uphold the guarantee that the broker is up if the > port > >> is responsive if we annotate only internal ports (like the controller > port, > >> or possibly inter-broker port) as "early start." This also helps with > the > >> case where people have some internal topic they want to be able to read > or > >> write from before turning on client traffic. For example, some people > send > >> Kafka metric data to an internal Kafka topic. > >> > >> >> Leaving aside the preceding discussion, do you agree with starting up > >> all > >> > endpoints (including non-early start ones) once we load a metadata > >> > snapshot? How feasible would it be for us to get a callback from the > Raft > >> > layer the first time we caught up to the last stable offset? (we only > >> want > >> > the callback the first time, not any other time). (I think the > metadata > >> > shell also would want something like this, at least as an option). > >> > > >> > It is certainly possible to get a callback when we've reached the high > >> > watermark, which seems like what we'd want. This does make me wonder > how > >> we > >> > would handle the case when a controller gets partitioned from the > rest of > >> > the quorum for some time. Should there be some point at which the > >> > controller no longer accepts authorization requests because of the > >> > potential staleness of the ACLs? Perhaps we could even reuse the > >> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other > >> words, > >> > maybe the readiness condition is more of a dynamic assertion that we > have > >> > caught up to near the current end of the log. > >> > > >> > >> We've discussed this issue in the past in the context of the broker. > >> Should we fence if the broker can't apply metadata for a certain amount > of > >> time? It's controversial because fencing may add more load to an already > >> loaded cluster. However, without new metadata, we won't get new ACL > >> changes, learn about new topics, etc. etc. > >> > >> This is probably something we should discuss in a separate KIP, since > it's > >> a bigger issue about how we handle nodes with stale metadata. And of > course > >> this issue does exist in the current ZK implementation -- you can talk > to a > >> broker that has been fenced from ZK for days, and there you are relying > on > >> whatever ACLs were in place when it could last talk to ZK. If we > implement > >> bounded staleness for KRaft, we may want to consider doing the same for > ZK, > >> for consistency. > >> > >> Another approach, which I suspect is more realistic in practice, is to > >> have a metric tracking staleness and fire off an alert when it grows too > >> high. Making the stale node inaccessible is sort of the nuclear option, > and > >> may tend to make a bad situation worse in practice. > >> > >> best, > >> Colin > >> > >> > >> > > >> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <[email protected]> > >> wrote: > >> > > >> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote: > >> >> > Hi Colin, > >> >> > > >> >> > Thanks for the KIP. A couple of questions: > >> >> > > >> >> > 1) The KIP says: > >> >> > *However, when the controller is active, its Authorizer state may > be > >> >> > slightly ahead of the the broker's Authorizer state. This will > happen > >> in > >> >> > the time between when a new ACL is created (or deleted) and the > time > >> that > >> >> > this change is persisted durably by a majority of controller quorum > >> >> peers.* > >> >> > > >> >> > Once the ACL takes effect on the controller, do we guarantee that > it > >> will > >> >> > be applied everywhere or can there be scenarios where the > controller > >> has > >> >> > applied the change, but others don't due to a subsequent failure in > >> >> > persisting the ACL changes? > >> >> > >> >> Hi Rajini, > >> >> > >> >> Thanks for taking a look. > >> >> > >> >> Good question. In general, for any new metadata record (not just ACL > >> >> records), there are two fates: > >> >> 1. the record will be persisted to the raft quorum. > >> >> 2. the raft leader will fail and a new leader will be elected that > >> doesn't > >> >> have the new record. > >> >> > >> >> In the case of #2, the standby controllers and the brokers will never > >> >> apply the record. > >> >> > >> >> In general the active controller always rolls back its state to the > last > >> >> committed offset if there is a failure and it loses the leadership. > So > >> this > >> >> isn't really unique to the KRaft Authorizer, it just seemed worth > >> pointing > >> >> out since there will be a brief period when the active controller is > >> >> "ahead." > >> >> > >> >> We do try pretty hard to apply KIP-801 ACL records in order so that > the > >> >> states you will see on brokers and standby controllers will always be > >> valid > >> >> states from some point in the past timeline. The consistency here > >> should be > >> >> at least as good as the current system. > >> >> > >> >> > > >> >> > 2) Have we considered using a different config with limited > privileges > >> >> for > >> >> > bootstrapping instead of the all-powerful *super.users*? > >> >> > > >> >> > >> >> That's an interesting idea, but I'm not sure how much we could limit > it. > >> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER > in > >> >> order to do most of what they do. This might be something we could > >> explore > >> >> in a future KIP since it's pretty cross-cutting (if we had such a > >> limited > >> >> bootstrapping user, all the authorizers could implement it, not just > the > >> >> KIP-801 one...) > >> >> > >> >> best, > >> >> Colin > >> >> > >> >> > > >> >> > Regards, > >> >> > > >> >> > Rajini > >> >> > > >> >> > > >> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <[email protected]> > >> wrote: > >> >> > > >> >> >> How about this: > >> >> >> > >> >> >> We create a configuration key called early.start.listeners which > >> >> contains > >> >> >> a list of listener names. If this is not specified, its value > >> defaults > >> >> to > >> >> >> just the controller listeners. Optionally, other listeners can be > >> added > >> >> too. > >> >> >> > >> >> >> If super.users contains any user names, early start listeners will > >> start > >> >> >> immediately. In the beginning they only authorize users that are > in > >> >> >> super.users. All other listeners receive a new error code, > >> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any > user > >> >> names, > >> >> >> then early start listeners will not be treated differently than > other > >> >> >> listeners. > >> >> >> > >> >> >> This will allow the controller listeners to get started > immediately > >> if > >> >> the > >> >> >> broker user is in super.users, which will speed up startup. It > also > >> >> will be > >> >> >> useful for breaking chicken/egg cycles like needing to pull the > SCRAM > >> >> >> metadata to authorize pulling the SCRAM metadata. > >> >> >> > >> >> >> There are still a few use cases where super.users won't be > required, > >> but > >> >> >> it may be useful in many cases to have this early start > >> functionality. > >> >> >> > >> >> >> Leaving aside the preceding discussion, do you agree with > starting up > >> >> all > >> >> >> endpoints (including non-early start ones) once we load a metadata > >> >> >> snapshot? How feasible would it be for us to get a callback from > the > >> >> Raft > >> >> >> layer the first time we caught up to the last stable offset? (we > only > >> >> want > >> >> >> the callback the first time, not any other time). (I think the > >> metadata > >> >> >> shell also would want something like this, at least as an option). > >> >> >> > >> >> >> best, > >> >> >> Colin > >> >> >> > >> >> >> > >> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote: > >> >> >> > Hi Colin, > >> >> >> > > >> >> >> > Thanks for the writeup. I had one question about bootstrapping. > For > >> >> the > >> >> >> > brokers, I understand that listener startup is delayed until the > >> >> >> Authorizer > >> >> >> > is ready. However, I was not very clear how this would work for > the > >> >> >> > controller listeners. We may need them to startup before the > >> metadata > >> >> log > >> >> >> > is ready so that a quorum can be established (as noted in the > KIP). > >> >> This > >> >> >> > works fine if we assume that the controller principals are among > >> >> >> > `super.users`. For requests forwarded from brokers, on the other > >> >> hand, we > >> >> >> > need to ensure the ACLs have been loaded properly before we > begin > >> >> >> > authorizing. The problem is that we currently use the same > listener > >> >> for > >> >> >> > quorum requests and for forwarded requests. So my question is > how > >> does > >> >> >> the > >> >> >> > Authorizer communicate to the controller when it is safe to > begin > >> >> >> > authorizing different request types? > >> >> >> > > >> >> >> > There are a couple ways I can see this working. First, we could > >> allow > >> >> the > >> >> >> > user to configure the listener used for forwarded requests > >> separately. > >> >> >> That > >> >> >> > would work with the existing `Authorizer.start` API. > Alternatively, > >> >> >> perhaps > >> >> >> > we could modify `Authorizer.start` to work with something more > >> >> granular > >> >> >> > than `EndPoint`. This would allow the controller to begin > accepting > >> >> >> > requests from the other quorum members before it is ready to > >> authorize > >> >> >> > forwarded requests from clients. Then we would need some way to > >> let > >> >> >> > brokers know when the controller is ready to accept these > forwarded > >> >> >> > requests (e.g. through an error code in the `Envelope` > response). > >> >> >> > > >> >> >> > What do you think? > >> >> >> > > >> >> >> > Thanks, > >> >> >> > Jason > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur > >> >> >> > <[email protected]> wrote: > >> >> >> > > >> >> >> >> +1 binding, thanks Colin! > >> >> >> >> > >> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe < > [email protected]> > >> >> >> wrote: > >> >> >> >> > >> >> >> >> > Hi all, > >> >> >> >> > > >> >> >> >> > I'd like to start the vote on KIP-801: Implement an > Authorizer > >> that > >> >> >> >> stores > >> >> >> >> > metadata in __cluster_metadata > >> >> >> >> > > >> >> >> >> > The KIP is here: > https://cwiki.apache.org/confluence/x/h5KqCw > >> >> >> >> > > >> >> >> >> > The original DISCUSS thread is here: > >> >> >> >> > > >> >> >> >> > > >> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq > >> >> >> >> > > >> >> >> >> > Please take a look and vote if you can. > >> >> >> >> > > >> >> >> >> > best, > >> >> >> >> > Colin > >> >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> -David > >> >> >> >> > >> >> >> > >> >> > >> >
