Hi all, Thanks, everyone. I've updated the KIP.
I reorganized the design section to explain bootstrapping a bit better, and put it first. (It was always weird that the design section came after the interface changes section :) ). I also added early.start.listeners and a section on the other configs (which are the same as in AclAuthorizer). best, Colin On Thu, Jan 27, 2022, at 16:38, Jason Gustafson wrote: >> My intention here was to avoid exposing Kafka clients to > AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal > components (such as the broker) then things are a lot simpler. Keep in mind > there are external management systems that assume that if the broker ports > accept traffic, the broker is "up". For example, Kubernetes operators, > external dashboards and health checks, etc. > > What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be > returned in Envelope responses (and only if the client principal is not a > super user). The broker would retry the operation until the request timeout > after seeing this error, which is similar to how we handle controller > unavailability in general. So the error is not seen by client requests. It > sounds like what you are suggesting instead is having a dedicated listener > for inter-cluster traffic? Can you clarify the rules that we use to decide > which listener to send forwarded requests to? > >> We've discussed this issue in the past in the context of the broker. > Should we fence if the broker can't apply metadata for a certain amount of > time? It's controversial because fencing may add more load to an already > loaded cluster. However, without new metadata, we won't get new ACL > changes, learn about new topics, etc. etc. > > Yep, I appreciate the concern. It is a bit different in the context of the > controller though. Without having access to the quorum, there is little a > controller can do anyway. That is why we filed this issue: > https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it > would be desirable if the same approach could be used by both the broker > and controller. > > -Jason > > > On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <cmcc...@apache.org> wrote: > >> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote: >> > Hey Colin, >> > >> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, then >> > would we need `early.start.listeners`? This would give the `Authorizer` a >> > way to support partial startup, but it would be up to the implementation >> to >> > decide whether to use it. For an `Authorizer` which depends only on an >> > external service, there is no dependence on the metadata log, so there is >> > no reason to have partial startup. For the `StandardAuthorizer`, we could >> > let it return a completed future from `Authorizer.start` for all of the >> > controller listeners. Then we can let it authorize only super.user >> > operations until the metadata log has been loaded (and raise >> > AuthorizerNotReadyException for everything else until that time). >> > >> >> Hi Jason, >> >> My intention here was to avoid exposing Kafka clients to >> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal >> components (such as the broker) then things are a lot simpler. Keep in mind >> there are external management systems that assume that if the broker ports >> accept traffic, the broker is "up". For example, Kubernetes operators, >> external dashboards and health checks, etc. >> >> We can continue to uphold the guarantee that the broker is up if the port >> is responsive if we annotate only internal ports (like the controller port, >> or possibly inter-broker port) as "early start." This also helps with the >> case where people have some internal topic they want to be able to read or >> write from before turning on client traffic. For example, some people send >> Kafka metric data to an internal Kafka topic. >> >> >> Leaving aside the preceding discussion, do you agree with starting up >> all >> > endpoints (including non-early start ones) once we load a metadata >> > snapshot? How feasible would it be for us to get a callback from the Raft >> > layer the first time we caught up to the last stable offset? (we only >> want >> > the callback the first time, not any other time). (I think the metadata >> > shell also would want something like this, at least as an option). >> > >> > It is certainly possible to get a callback when we've reached the high >> > watermark, which seems like what we'd want. This does make me wonder how >> we >> > would handle the case when a controller gets partitioned from the rest of >> > the quorum for some time. Should there be some point at which the >> > controller no longer accepts authorization requests because of the >> > potential staleness of the ACLs? Perhaps we could even reuse the >> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other >> words, >> > maybe the readiness condition is more of a dynamic assertion that we have >> > caught up to near the current end of the log. >> > >> >> We've discussed this issue in the past in the context of the broker. >> Should we fence if the broker can't apply metadata for a certain amount of >> time? It's controversial because fencing may add more load to an already >> loaded cluster. However, without new metadata, we won't get new ACL >> changes, learn about new topics, etc. etc. >> >> This is probably something we should discuss in a separate KIP, since it's >> a bigger issue about how we handle nodes with stale metadata. And of course >> this issue does exist in the current ZK implementation -- you can talk to a >> broker that has been fenced from ZK for days, and there you are relying on >> whatever ACLs were in place when it could last talk to ZK. If we implement >> bounded staleness for KRaft, we may want to consider doing the same for ZK, >> for consistency. >> >> Another approach, which I suspect is more realistic in practice, is to >> have a metric tracking staleness and fire off an alert when it grows too >> high. Making the stale node inaccessible is sort of the nuclear option, and >> may tend to make a bad situation worse in practice. >> >> best, >> Colin >> >> >> > >> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cmcc...@apache.org> >> wrote: >> > >> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote: >> >> > Hi Colin, >> >> > >> >> > Thanks for the KIP. A couple of questions: >> >> > >> >> > 1) The KIP says: >> >> > *However, when the controller is active, its Authorizer state may be >> >> > slightly ahead of the the broker's Authorizer state. This will happen >> in >> >> > the time between when a new ACL is created (or deleted) and the time >> that >> >> > this change is persisted durably by a majority of controller quorum >> >> peers.* >> >> > >> >> > Once the ACL takes effect on the controller, do we guarantee that it >> will >> >> > be applied everywhere or can there be scenarios where the controller >> has >> >> > applied the change, but others don't due to a subsequent failure in >> >> > persisting the ACL changes? >> >> >> >> Hi Rajini, >> >> >> >> Thanks for taking a look. >> >> >> >> Good question. In general, for any new metadata record (not just ACL >> >> records), there are two fates: >> >> 1. the record will be persisted to the raft quorum. >> >> 2. the raft leader will fail and a new leader will be elected that >> doesn't >> >> have the new record. >> >> >> >> In the case of #2, the standby controllers and the brokers will never >> >> apply the record. >> >> >> >> In general the active controller always rolls back its state to the last >> >> committed offset if there is a failure and it loses the leadership. So >> this >> >> isn't really unique to the KRaft Authorizer, it just seemed worth >> pointing >> >> out since there will be a brief period when the active controller is >> >> "ahead." >> >> >> >> We do try pretty hard to apply KIP-801 ACL records in order so that the >> >> states you will see on brokers and standby controllers will always be >> valid >> >> states from some point in the past timeline. The consistency here >> should be >> >> at least as good as the current system. >> >> >> >> > >> >> > 2) Have we considered using a different config with limited privileges >> >> for >> >> > bootstrapping instead of the all-powerful *super.users*? >> >> > >> >> >> >> That's an interesting idea, but I'm not sure how much we could limit it. >> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in >> >> order to do most of what they do. This might be something we could >> explore >> >> in a future KIP since it's pretty cross-cutting (if we had such a >> limited >> >> bootstrapping user, all the authorizers could implement it, not just the >> >> KIP-801 one...) >> >> >> >> best, >> >> Colin >> >> >> >> > >> >> > Regards, >> >> > >> >> > Rajini >> >> > >> >> > >> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cmcc...@apache.org> >> wrote: >> >> > >> >> >> How about this: >> >> >> >> >> >> We create a configuration key called early.start.listeners which >> >> contains >> >> >> a list of listener names. If this is not specified, its value >> defaults >> >> to >> >> >> just the controller listeners. Optionally, other listeners can be >> added >> >> too. >> >> >> >> >> >> If super.users contains any user names, early start listeners will >> start >> >> >> immediately. In the beginning they only authorize users that are in >> >> >> super.users. All other listeners receive a new error code, >> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user >> >> names, >> >> >> then early start listeners will not be treated differently than other >> >> >> listeners. >> >> >> >> >> >> This will allow the controller listeners to get started immediately >> if >> >> the >> >> >> broker user is in super.users, which will speed up startup. It also >> >> will be >> >> >> useful for breaking chicken/egg cycles like needing to pull the SCRAM >> >> >> metadata to authorize pulling the SCRAM metadata. >> >> >> >> >> >> There are still a few use cases where super.users won't be required, >> but >> >> >> it may be useful in many cases to have this early start >> functionality. >> >> >> >> >> >> Leaving aside the preceding discussion, do you agree with starting up >> >> all >> >> >> endpoints (including non-early start ones) once we load a metadata >> >> >> snapshot? How feasible would it be for us to get a callback from the >> >> Raft >> >> >> layer the first time we caught up to the last stable offset? (we only >> >> want >> >> >> the callback the first time, not any other time). (I think the >> metadata >> >> >> shell also would want something like this, at least as an option). >> >> >> >> >> >> best, >> >> >> Colin >> >> >> >> >> >> >> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote: >> >> >> > Hi Colin, >> >> >> > >> >> >> > Thanks for the writeup. I had one question about bootstrapping. For >> >> the >> >> >> > brokers, I understand that listener startup is delayed until the >> >> >> Authorizer >> >> >> > is ready. However, I was not very clear how this would work for the >> >> >> > controller listeners. We may need them to startup before the >> metadata >> >> log >> >> >> > is ready so that a quorum can be established (as noted in the KIP). >> >> This >> >> >> > works fine if we assume that the controller principals are among >> >> >> > `super.users`. For requests forwarded from brokers, on the other >> >> hand, we >> >> >> > need to ensure the ACLs have been loaded properly before we begin >> >> >> > authorizing. The problem is that we currently use the same listener >> >> for >> >> >> > quorum requests and for forwarded requests. So my question is how >> does >> >> >> the >> >> >> > Authorizer communicate to the controller when it is safe to begin >> >> >> > authorizing different request types? >> >> >> > >> >> >> > There are a couple ways I can see this working. First, we could >> allow >> >> the >> >> >> > user to configure the listener used for forwarded requests >> separately. >> >> >> That >> >> >> > would work with the existing `Authorizer.start` API. Alternatively, >> >> >> perhaps >> >> >> > we could modify `Authorizer.start` to work with something more >> >> granular >> >> >> > than `EndPoint`. This would allow the controller to begin accepting >> >> >> > requests from the other quorum members before it is ready to >> authorize >> >> >> > forwarded requests from clients. Then we would need some way to >> let >> >> >> > brokers know when the controller is ready to accept these forwarded >> >> >> > requests (e.g. through an error code in the `Envelope` response). >> >> >> > >> >> >> > What do you think? >> >> >> > >> >> >> > Thanks, >> >> >> > Jason >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur >> >> >> > <david.art...@confluent.io.invalid> wrote: >> >> >> > >> >> >> >> +1 binding, thanks Colin! >> >> >> >> >> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cmcc...@apache.org> >> >> >> wrote: >> >> >> >> >> >> >> >> > Hi all, >> >> >> >> > >> >> >> >> > I'd like to start the vote on KIP-801: Implement an Authorizer >> that >> >> >> >> stores >> >> >> >> > metadata in __cluster_metadata >> >> >> >> > >> >> >> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw >> >> >> >> > >> >> >> >> > The original DISCUSS thread is here: >> >> >> >> > >> >> >> >> > >> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq >> >> >> >> > >> >> >> >> > Please take a look and vote if you can. >> >> >> >> > >> >> >> >> > best, >> >> >> >> > Colin >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> -David >> >> >> >> >> >> >> >> >> >>