> My intention here was to avoid exposing Kafka clients to AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal components (such as the broker) then things are a lot simpler. Keep in mind there are external management systems that assume that if the broker ports accept traffic, the broker is "up". For example, Kubernetes operators, external dashboards and health checks, etc.
What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be returned in Envelope responses (and only if the client principal is not a super user). The broker would retry the operation until the request timeout after seeing this error, which is similar to how we handle controller unavailability in general. So the error is not seen by client requests. It sounds like what you are suggesting instead is having a dedicated listener for inter-cluster traffic? Can you clarify the rules that we use to decide which listener to send forwarded requests to? > We've discussed this issue in the past in the context of the broker. Should we fence if the broker can't apply metadata for a certain amount of time? It's controversial because fencing may add more load to an already loaded cluster. However, without new metadata, we won't get new ACL changes, learn about new topics, etc. etc. Yep, I appreciate the concern. It is a bit different in the context of the controller though. Without having access to the quorum, there is little a controller can do anyway. That is why we filed this issue: https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it would be desirable if the same approach could be used by both the broker and controller. -Jason On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <cmcc...@apache.org> wrote: > On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote: > > Hey Colin, > > > > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, then > > would we need `early.start.listeners`? This would give the `Authorizer` a > > way to support partial startup, but it would be up to the implementation > to > > decide whether to use it. For an `Authorizer` which depends only on an > > external service, there is no dependence on the metadata log, so there is > > no reason to have partial startup. For the `StandardAuthorizer`, we could > > let it return a completed future from `Authorizer.start` for all of the > > controller listeners. Then we can let it authorize only super.user > > operations until the metadata log has been loaded (and raise > > AuthorizerNotReadyException for everything else until that time). > > > > Hi Jason, > > My intention here was to avoid exposing Kafka clients to > AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal > components (such as the broker) then things are a lot simpler. Keep in mind > there are external management systems that assume that if the broker ports > accept traffic, the broker is "up". For example, Kubernetes operators, > external dashboards and health checks, etc. > > We can continue to uphold the guarantee that the broker is up if the port > is responsive if we annotate only internal ports (like the controller port, > or possibly inter-broker port) as "early start." This also helps with the > case where people have some internal topic they want to be able to read or > write from before turning on client traffic. For example, some people send > Kafka metric data to an internal Kafka topic. > > >> Leaving aside the preceding discussion, do you agree with starting up > all > > endpoints (including non-early start ones) once we load a metadata > > snapshot? How feasible would it be for us to get a callback from the Raft > > layer the first time we caught up to the last stable offset? (we only > want > > the callback the first time, not any other time). (I think the metadata > > shell also would want something like this, at least as an option). > > > > It is certainly possible to get a callback when we've reached the high > > watermark, which seems like what we'd want. This does make me wonder how > we > > would handle the case when a controller gets partitioned from the rest of > > the quorum for some time. Should there be some point at which the > > controller no longer accepts authorization requests because of the > > potential staleness of the ACLs? Perhaps we could even reuse the > > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other > words, > > maybe the readiness condition is more of a dynamic assertion that we have > > caught up to near the current end of the log. > > > > We've discussed this issue in the past in the context of the broker. > Should we fence if the broker can't apply metadata for a certain amount of > time? It's controversial because fencing may add more load to an already > loaded cluster. However, without new metadata, we won't get new ACL > changes, learn about new topics, etc. etc. > > This is probably something we should discuss in a separate KIP, since it's > a bigger issue about how we handle nodes with stale metadata. And of course > this issue does exist in the current ZK implementation -- you can talk to a > broker that has been fenced from ZK for days, and there you are relying on > whatever ACLs were in place when it could last talk to ZK. If we implement > bounded staleness for KRaft, we may want to consider doing the same for ZK, > for consistency. > > Another approach, which I suspect is more realistic in practice, is to > have a metric tracking staleness and fire off an alert when it grows too > high. Making the stale node inaccessible is sort of the nuclear option, and > may tend to make a bad situation worse in practice. > > best, > Colin > > > > > > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cmcc...@apache.org> > wrote: > > > >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote: > >> > Hi Colin, > >> > > >> > Thanks for the KIP. A couple of questions: > >> > > >> > 1) The KIP says: > >> > *However, when the controller is active, its Authorizer state may be > >> > slightly ahead of the the broker's Authorizer state. This will happen > in > >> > the time between when a new ACL is created (or deleted) and the time > that > >> > this change is persisted durably by a majority of controller quorum > >> peers.* > >> > > >> > Once the ACL takes effect on the controller, do we guarantee that it > will > >> > be applied everywhere or can there be scenarios where the controller > has > >> > applied the change, but others don't due to a subsequent failure in > >> > persisting the ACL changes? > >> > >> Hi Rajini, > >> > >> Thanks for taking a look. > >> > >> Good question. In general, for any new metadata record (not just ACL > >> records), there are two fates: > >> 1. the record will be persisted to the raft quorum. > >> 2. the raft leader will fail and a new leader will be elected that > doesn't > >> have the new record. > >> > >> In the case of #2, the standby controllers and the brokers will never > >> apply the record. > >> > >> In general the active controller always rolls back its state to the last > >> committed offset if there is a failure and it loses the leadership. So > this > >> isn't really unique to the KRaft Authorizer, it just seemed worth > pointing > >> out since there will be a brief period when the active controller is > >> "ahead." > >> > >> We do try pretty hard to apply KIP-801 ACL records in order so that the > >> states you will see on brokers and standby controllers will always be > valid > >> states from some point in the past timeline. The consistency here > should be > >> at least as good as the current system. > >> > >> > > >> > 2) Have we considered using a different config with limited privileges > >> for > >> > bootstrapping instead of the all-powerful *super.users*? > >> > > >> > >> That's an interesting idea, but I'm not sure how much we could limit it. > >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in > >> order to do most of what they do. This might be something we could > explore > >> in a future KIP since it's pretty cross-cutting (if we had such a > limited > >> bootstrapping user, all the authorizers could implement it, not just the > >> KIP-801 one...) > >> > >> best, > >> Colin > >> > >> > > >> > Regards, > >> > > >> > Rajini > >> > > >> > > >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cmcc...@apache.org> > wrote: > >> > > >> >> How about this: > >> >> > >> >> We create a configuration key called early.start.listeners which > >> contains > >> >> a list of listener names. If this is not specified, its value > defaults > >> to > >> >> just the controller listeners. Optionally, other listeners can be > added > >> too. > >> >> > >> >> If super.users contains any user names, early start listeners will > start > >> >> immediately. In the beginning they only authorize users that are in > >> >> super.users. All other listeners receive a new error code, > >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user > >> names, > >> >> then early start listeners will not be treated differently than other > >> >> listeners. > >> >> > >> >> This will allow the controller listeners to get started immediately > if > >> the > >> >> broker user is in super.users, which will speed up startup. It also > >> will be > >> >> useful for breaking chicken/egg cycles like needing to pull the SCRAM > >> >> metadata to authorize pulling the SCRAM metadata. > >> >> > >> >> There are still a few use cases where super.users won't be required, > but > >> >> it may be useful in many cases to have this early start > functionality. > >> >> > >> >> Leaving aside the preceding discussion, do you agree with starting up > >> all > >> >> endpoints (including non-early start ones) once we load a metadata > >> >> snapshot? How feasible would it be for us to get a callback from the > >> Raft > >> >> layer the first time we caught up to the last stable offset? (we only > >> want > >> >> the callback the first time, not any other time). (I think the > metadata > >> >> shell also would want something like this, at least as an option). > >> >> > >> >> best, > >> >> Colin > >> >> > >> >> > >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote: > >> >> > Hi Colin, > >> >> > > >> >> > Thanks for the writeup. I had one question about bootstrapping. For > >> the > >> >> > brokers, I understand that listener startup is delayed until the > >> >> Authorizer > >> >> > is ready. However, I was not very clear how this would work for the > >> >> > controller listeners. We may need them to startup before the > metadata > >> log > >> >> > is ready so that a quorum can be established (as noted in the KIP). > >> This > >> >> > works fine if we assume that the controller principals are among > >> >> > `super.users`. For requests forwarded from brokers, on the other > >> hand, we > >> >> > need to ensure the ACLs have been loaded properly before we begin > >> >> > authorizing. The problem is that we currently use the same listener > >> for > >> >> > quorum requests and for forwarded requests. So my question is how > does > >> >> the > >> >> > Authorizer communicate to the controller when it is safe to begin > >> >> > authorizing different request types? > >> >> > > >> >> > There are a couple ways I can see this working. First, we could > allow > >> the > >> >> > user to configure the listener used for forwarded requests > separately. > >> >> That > >> >> > would work with the existing `Authorizer.start` API. Alternatively, > >> >> perhaps > >> >> > we could modify `Authorizer.start` to work with something more > >> granular > >> >> > than `EndPoint`. This would allow the controller to begin accepting > >> >> > requests from the other quorum members before it is ready to > authorize > >> >> > forwarded requests from clients. Then we would need some way to > let > >> >> > brokers know when the controller is ready to accept these forwarded > >> >> > requests (e.g. through an error code in the `Envelope` response). > >> >> > > >> >> > What do you think? > >> >> > > >> >> > Thanks, > >> >> > Jason > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur > >> >> > <david.art...@confluent.io.invalid> wrote: > >> >> > > >> >> >> +1 binding, thanks Colin! > >> >> >> > >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cmcc...@apache.org> > >> >> wrote: > >> >> >> > >> >> >> > Hi all, > >> >> >> > > >> >> >> > I'd like to start the vote on KIP-801: Implement an Authorizer > that > >> >> >> stores > >> >> >> > metadata in __cluster_metadata > >> >> >> > > >> >> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw > >> >> >> > > >> >> >> > The original DISCUSS thread is here: > >> >> >> > > >> >> >> > > https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq > >> >> >> > > >> >> >> > Please take a look and vote if you can. > >> >> >> > > >> >> >> > best, > >> >> >> > Colin > >> >> >> > > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> -David > >> >> >> > >> >> > >> >