Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Colin McCabe Mon, 31 Jan 2022 15:02:13 -0800

Hi all,

Thanks, everyone. I've updated the KIP.


I reorganized the design section to explain bootstrapping a bit better, and put 
it first. (It was always weird that the design section came after the interface 
changes section :) ). I also added early.start.listeners and a section on the 
other configs (which are the same as in AclAuthorizer).

best,
Colin

On Thu, Jan 27, 2022, at 16:38, Jason Gustafson wrote:
>> My intention here was to avoid exposing Kafka clients to
> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal
> components (such as the broker) then things are a lot simpler. Keep in mind
> there are external management systems that assume that if the broker ports
> accept traffic, the broker is "up". For example, Kubernetes operators,
> external dashboards and health checks, etc.
>
> What I had in mind is that AUTHORIZER_NOT_READY_ERROR would only be
> returned in Envelope responses (and only if the client principal is not a
> super user). The broker would retry the operation until the request timeout
> after seeing this error, which is similar to how we handle controller
> unavailability in general. So the error is not seen by client requests. It
> sounds like what you are suggesting instead is having a dedicated listener
> for inter-cluster traffic? Can  you clarify the rules that we use to decide
> which listener to send forwarded requests to?
>
>> We've discussed this issue in the past in the context of the broker.
> Should we fence if the broker can't apply metadata for a certain amount of
> time? It's controversial because fencing may add more load to an already
> loaded cluster. However, without new metadata, we won't get new ACL
> changes, learn about new topics, etc. etc.
>
> Yep, I appreciate the concern. It is a bit different in the context of the
> controller though. Without having access to the quorum, there is little a
> controller can do anyway. That is why we filed this issue:
> https://issues.apache.org/jira/browse/KAFKA-13621. That said, I agree it
> would be desirable if the same approach could be used by both the broker
> and controller.
>
> -Jason
>
>
> On Thu, Jan 27, 2022 at 4:14 PM Colin McCabe <cmcc...@apache.org> wrote:
>
>> On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote:
>> > Hey Colin,
>> >
>> > If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, then
>> > would we need `early.start.listeners`? This would give the `Authorizer` a
>> > way to support partial startup, but it would be up to the implementation
>> to
>> > decide whether to use it. For an `Authorizer` which depends only on an
>> > external service, there is no dependence on the metadata log, so there is
>> > no reason to have partial startup. For the `StandardAuthorizer`, we could
>> > let it return a completed future from `Authorizer.start` for all of the
>> > controller listeners. Then we can let it authorize only super.user
>> > operations until the metadata log has been loaded (and raise
>> > AuthorizerNotReadyException for everything else until that time).
>> >
>>
>> Hi Jason,
>>
>> My intention here was to avoid exposing Kafka clients to
>> AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal
>> components (such as the broker) then things are a lot simpler. Keep in mind
>> there are external management systems that assume that if the broker ports
>> accept traffic, the broker is "up". For example, Kubernetes operators,
>> external dashboards and health checks, etc.
>>
>> We can continue to uphold the guarantee that the broker is up if the port
>> is responsive if we annotate only internal ports (like the controller port,
>> or possibly inter-broker port) as "early start." This also helps with the
>> case where people have some internal topic they want to be able to read or
>> write from before turning on client traffic. For example, some people send
>> Kafka metric data to an internal Kafka topic.
>>
>> >> Leaving aside the preceding discussion, do you agree with starting up
>> all
>> > endpoints (including non-early start ones) once we load a metadata
>> > snapshot? How feasible would it be for us to get a callback from the Raft
>> > layer the first time we caught up to the last stable offset? (we only
>> want
>> > the callback the first time, not any other time). (I think the metadata
>> > shell also would want something like this, at least as an option).
>> >
>> > It is certainly possible to get a callback when we've reached the high
>> > watermark, which seems like what we'd want. This does make me wonder how
>> we
>> > would handle the case when a controller gets partitioned from the rest of
>> > the quorum for some time. Should there be some point at which the
>> > controller no longer accepts authorization requests because of the
>> > potential staleness of the ACLs? Perhaps we could even reuse the
>> > `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other
>> words,
>> > maybe the readiness condition is more of a dynamic assertion that we have
>> > caught up to near the current end of the log.
>> >
>>
>> We've discussed this issue in the past in the context of the broker.
>> Should we fence if the broker can't apply metadata for a certain amount of
>> time? It's controversial because fencing may add more load to an already
>> loaded cluster. However, without new metadata, we won't get new ACL
>> changes, learn about new topics, etc. etc.
>>
>> This is probably something we should discuss in a separate KIP, since it's
>> a bigger issue about how we handle nodes with stale metadata. And of course
>> this issue does exist in the current ZK implementation -- you can talk to a
>> broker that has been fenced from ZK for days, and there you are relying on
>> whatever ACLs were in place when it could last talk to ZK. If we implement
>> bounded staleness for KRaft, we may want to consider doing the same for ZK,
>> for consistency.
>>
>> Another approach, which I suspect is more realistic in practice, is to
>> have a metric tracking staleness and fire off an alert when it grows too
>> high. Making the stale node inaccessible is sort of the nuclear option, and
>> may tend to make a bad situation worse in practice.
>>
>> best,
>> Colin
>>
>>
>> >
>> > On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cmcc...@apache.org>
>> wrote:
>> >
>> >> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
>> >> > Hi Colin,
>> >> >
>> >> > Thanks for the KIP. A couple of questions:
>> >> >
>> >> > 1) The KIP says:
>> >> > *However, when the controller is active, its Authorizer state may be
>> >> > slightly ahead of the the broker's Authorizer state. This will happen
>> in
>> >> > the time between when a new ACL is created (or deleted) and the time
>> that
>> >> > this change is persisted durably by a majority of controller quorum
>> >> peers.*
>> >> >
>> >> > Once the ACL takes effect on the controller, do we guarantee that it
>> will
>> >> > be applied everywhere or can there be scenarios where the controller
>> has
>> >> > applied the change, but others don't due to a subsequent failure in
>> >> > persisting the ACL changes?
>> >>
>> >> Hi Rajini,
>> >>
>> >> Thanks for taking a look.
>> >>
>> >> Good question. In general, for any new metadata record (not just ACL
>> >> records), there are two fates:
>> >> 1. the record will be persisted to the raft quorum.
>> >> 2. the raft leader will fail and a new leader will be elected that
>> doesn't
>> >> have the new record.
>> >>
>> >> In the case of #2, the standby controllers and the brokers will never
>> >> apply the record.
>> >>
>> >> In general the active controller always rolls back its state to the last
>> >> committed offset if there is a failure and it loses the leadership. So
>> this
>> >> isn't really unique to the KRaft Authorizer, it just seemed worth
>> pointing
>> >> out since there will be a brief period when the active controller is
>> >> "ahead."
>> >>
>> >> We do try pretty hard to apply KIP-801 ACL records in order so that the
>> >> states you will see on brokers and standby controllers will always be
>> valid
>> >> states from some point in the past timeline. The consistency here
>> should be
>> >> at least as good as the current system.
>> >>
>> >> >
>> >> > 2) Have we considered using a different config with limited privileges
>> >> for
>> >> > bootstrapping instead of the all-powerful *super.users*?
>> >> >
>> >>
>> >> That's an interesting idea, but I'm not sure how much we could limit it.
>> >> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in
>> >> order to do most of what they do. This might be something we could
>> explore
>> >> in a future KIP since it's pretty cross-cutting (if we had such a
>> limited
>> >> bootstrapping user, all the authorizers could implement it, not just the
>> >> KIP-801 one...)
>> >>
>> >> best,
>> >> Colin
>> >>
>> >> >
>> >> > Regards,
>> >> >
>> >> > Rajini
>> >> >
>> >> >
>> >> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cmcc...@apache.org>
>> wrote:
>> >> >
>> >> >> How about this:
>> >> >>
>> >> >> We create a configuration key called early.start.listeners which
>> >> contains
>> >> >> a list of listener names. If this is not specified, its value
>> defaults
>> >> to
>> >> >> just the controller listeners. Optionally, other listeners can be
>> added
>> >> too.
>> >> >>
>> >> >> If super.users contains any user names, early start listeners will
>> start
>> >> >> immediately. In the beginning they only authorize users that are in
>> >> >> super.users. All other listeners receive a new error code,
>> >> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user
>> >> names,
>> >> >> then early start listeners will not be treated differently than other
>> >> >> listeners.
>> >> >>
>> >> >> This will allow the controller listeners to get started immediately
>> if
>> >> the
>> >> >> broker user is in super.users, which will speed up startup. It also
>> >> will be
>> >> >> useful for breaking chicken/egg cycles like needing to pull the SCRAM
>> >> >> metadata to authorize pulling the SCRAM metadata.
>> >> >>
>> >> >> There are still a few use cases where super.users won't be required,
>> but
>> >> >> it may be useful in many cases to have this early start
>> functionality.
>> >> >>
>> >> >> Leaving aside the preceding discussion, do you agree with starting up
>> >> all
>> >> >> endpoints (including non-early start ones) once we load a metadata
>> >> >> snapshot? How feasible would it be for us to get a callback from the
>> >> Raft
>> >> >> layer the first time we caught up to the last stable offset? (we only
>> >> want
>> >> >> the callback the first time, not any other time). (I think the
>> metadata
>> >> >> shell also would want something like this, at least as an option).
>> >> >>
>> >> >> best,
>> >> >> Colin
>> >> >>
>> >> >>
>> >> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
>> >> >> > Hi Colin,
>> >> >> >
>> >> >> > Thanks for the writeup. I had one question about bootstrapping. For
>> >> the
>> >> >> > brokers, I understand that listener startup is delayed until the
>> >> >> Authorizer
>> >> >> > is ready. However, I was not very clear how this would work for the
>> >> >> > controller listeners. We may need them to startup before the
>> metadata
>> >> log
>> >> >> > is ready so that a quorum can be established (as noted in the KIP).
>> >> This
>> >> >> > works fine if we assume that the controller principals are among
>> >> >> > `super.users`. For requests forwarded from brokers, on the other
>> >> hand, we
>> >> >> > need to ensure the ACLs have been loaded properly before we begin
>> >> >> > authorizing. The problem is that we currently use the same listener
>> >> for
>> >> >> > quorum requests and for forwarded requests. So my question is how
>> does
>> >> >> the
>> >> >> > Authorizer communicate to the controller when it is safe to begin
>> >> >> > authorizing different request types?
>> >> >> >
>> >> >> > There are a couple ways I can see this working. First, we could
>> allow
>> >> the
>> >> >> > user to configure the listener used for forwarded requests
>> separately.
>> >> >> That
>> >> >> > would work with the existing `Authorizer.start` API. Alternatively,
>> >> >> perhaps
>> >> >> > we could modify `Authorizer.start` to work with something more
>> >> granular
>> >> >> > than `EndPoint`. This would allow the controller to begin accepting
>> >> >> > requests from the other quorum members before it is ready to
>> authorize
>> >> >> > forwarded requests from clients.  Then we would need some way to
>> let
>> >> >> > brokers know when the controller is ready to accept these forwarded
>> >> >> > requests (e.g. through an error code in the `Envelope` response).
>> >> >> >
>> >> >> > What do you think?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Jason
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
>> >> >> > <david.art...@confluent.io.invalid> wrote:
>> >> >> >
>> >> >> >> +1 binding, thanks Colin!
>> >> >> >>
>> >> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cmcc...@apache.org>
>> >> >> wrote:
>> >> >> >>
>> >> >> >> > Hi all,
>> >> >> >> >
>> >> >> >> > I'd like to start the vote on KIP-801: Implement an Authorizer
>> that
>> >> >> >> stores
>> >> >> >> > metadata in __cluster_metadata
>> >> >> >> >
>> >> >> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
>> >> >> >> >
>> >> >> >> > The original DISCUSS thread is here:
>> >> >> >> >
>> >> >> >> >
>> https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>> >> >> >> >
>> >> >> >> > Please take a look and vote if you can.
>> >> >> >> >
>> >> >> >> > best,
>> >> >> >> > Colin
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> -David
>> >> >> >>
>> >> >>
>> >>
>>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Reply via email to