Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Colin McCabe Thu, 27 Jan 2022 16:14:36 -0800

On Thu, Jan 27, 2022, at 15:52, Jason Gustafson wrote:
> Hey Colin,
>
> If we allow the `Authorizer` to raise `AUTHORIZER_NOT_READY_ERROR`, then
> would we need `early.start.listeners`? This would give the `Authorizer` a
> way to support partial startup, but it would be up to the implementation to
> decide whether to use it. For an `Authorizer` which depends only on an
> external service, there is no dependence on the metadata log, so there is
> no reason to have partial startup. For the `StandardAuthorizer`, we could
> let it return a completed future from `Authorizer.start` for all of the
> controller listeners. Then we can let it authorize only super.user
> operations until the metadata log has been loaded (and raise
> AuthorizerNotReadyException for everything else until that time).
>


Hi Jason,

My intention here was to avoid exposing Kafka clients to 
AUTHORIZER_NOT_READY_ERROR. If this error is only ever returned to internal 
components (such as the broker) then things are a lot simpler. Keep in mind 
there are external management systems that assume that if the broker ports 
accept traffic, the broker is "up". For example, Kubernetes operators, external 
dashboards and health checks, etc.

We can continue to uphold the guarantee that the broker is up if the port is 
responsive if we annotate only internal ports (like the controller port, or 
possibly inter-broker port) as "early start." This also helps with the case 
where people have some internal topic they want to be able to read or write 
from before turning on client traffic. For example, some people send Kafka 
metric data to an internal Kafka topic.

>> Leaving aside the preceding discussion, do you agree with starting up all
> endpoints (including non-early start ones) once we load a metadata
> snapshot? How feasible would it be for us to get a callback from the Raft
> layer the first time we caught up to the last stable offset? (we only want
> the callback the first time, not any other time). (I think the metadata
> shell also would want something like this, at least as an option).
>
> It is certainly possible to get a callback when we've reached the high
> watermark, which seems like what we'd want. This does make me wonder how we
> would handle the case when a controller gets partitioned from the rest of
> the quorum for some time. Should there be some point at which the
> controller no longer accepts authorization requests because of the
> potential staleness of the ACLs? Perhaps we could even reuse the
> `AUTHORIZER_NOT_READY_ERROR` error code in this condition. In other words,
> maybe the readiness condition is more of a dynamic assertion that we have
> caught up to near the current end of the log.
>

We've discussed this issue in the past in the context of the broker. Should we 
fence if the broker can't apply metadata for a certain amount of time? It's 
controversial because fencing may add more load to an already loaded cluster. 
However, without new metadata, we won't get new ACL changes, learn about new 
topics, etc. etc.

This is probably something we should discuss in a separate KIP, since it's a 
bigger issue about how we handle nodes with stale metadata. And of course this 
issue does exist in the current ZK implementation -- you can talk to a broker 
that has been fenced from ZK for days, and there you are relying on whatever 
ACLs were in place when it could last talk to ZK. If we implement bounded 
staleness for KRaft, we may want to consider doing the same for ZK, for 
consistency.

Another approach, which I suspect is more realistic in practice, is to have a 
metric tracking staleness and fire off an alert when it grows too high. Making 
the stale node inaccessible is sort of the nuclear option, and may tend to make 
a bad situation worse in practice.

best,
Colin


>
> On Wed, Jan 26, 2022 at 10:23 AM Colin McCabe <cmcc...@apache.org> wrote:
>
>> On Wed, Jan 26, 2022, at 01:54, Rajini Sivaram wrote:
>> > Hi Colin,
>> >
>> > Thanks for the KIP. A couple of questions:
>> >
>> > 1) The KIP says:
>> > *However, when the controller is active, its Authorizer state may be
>> > slightly ahead of the the broker's Authorizer state. This will happen in
>> > the time between when a new ACL is created (or deleted) and the time that
>> > this change is persisted durably by a majority of controller quorum
>> peers.*
>> >
>> > Once the ACL takes effect on the controller, do we guarantee that it will
>> > be applied everywhere or can there be scenarios where the controller has
>> > applied the change, but others don't due to a subsequent failure in
>> > persisting the ACL changes?
>>
>> Hi Rajini,
>>
>> Thanks for taking a look.
>>
>> Good question. In general, for any new metadata record (not just ACL
>> records), there are two fates:
>> 1. the record will be persisted to the raft quorum.
>> 2. the raft leader will fail and a new leader will be elected that doesn't
>> have the new record.
>>
>> In the case of #2, the standby controllers and the brokers will never
>> apply the record.
>>
>> In general the active controller always rolls back its state to the last
>> committed offset if there is a failure and it loses the leadership. So this
>> isn't really unique to the KRaft Authorizer, it just seemed worth pointing
>> out since there will be a brief period when the active controller is
>> "ahead."
>>
>> We do try pretty hard to apply KIP-801 ACL records in order so that the
>> states you will see on brokers and standby controllers will always be valid
>> states from some point in the past timeline. The consistency here should be
>> at least as good as the current system.
>>
>> >
>> > 2) Have we considered using a different config with limited privileges
>> for
>> > bootstrapping instead of the all-powerful *super.users*?
>> >
>>
>> That's an interesting idea, but I'm not sure how much we could limit it.
>> The brokers and controllers at least need CLUSTER_ACTION on CLUSTER in
>> order to do most of what they do. This might be something we could explore
>> in a future KIP since it's pretty cross-cutting (if we had such a limited
>> bootstrapping user, all the authorizers could implement it, not just the
>> KIP-801 one...)
>>
>> best,
>> Colin
>>
>> >
>> > Regards,
>> >
>> > Rajini
>> >
>> >
>> > On Wed, Jan 26, 2022 at 1:50 AM Colin McCabe <cmcc...@apache.org> wrote:
>> >
>> >> How about this:
>> >>
>> >> We create a configuration key called early.start.listeners which
>> contains
>> >> a list of listener names. If this is not specified, its value defaults
>> to
>> >> just the controller listeners. Optionally, other listeners can be added
>> too.
>> >>
>> >> If super.users contains any user names, early start listeners will start
>> >> immediately. In the beginning they only authorize users that are in
>> >> super.users. All other listeners receive a new error code,
>> >> AUTHORIZER_NOT_READY_ERROR. If super.users does not contain any user
>> names,
>> >> then early start listeners will not be treated differently than other
>> >> listeners.
>> >>
>> >> This will allow the controller listeners to get started immediately if
>> the
>> >> broker user is in super.users, which will speed up startup. It also
>> will be
>> >> useful for breaking chicken/egg cycles like needing to pull the SCRAM
>> >> metadata to authorize pulling the SCRAM metadata.
>> >>
>> >> There are still a few use cases where super.users won't be required, but
>> >> it may be useful in many cases to have this early start functionality.
>> >>
>> >> Leaving aside the preceding discussion, do you agree with starting up
>> all
>> >> endpoints (including non-early start ones) once we load a metadata
>> >> snapshot? How feasible would it be for us to get a callback from the
>> Raft
>> >> layer the first time we caught up to the last stable offset? (we only
>> want
>> >> the callback the first time, not any other time). (I think the metadata
>> >> shell also would want something like this, at least as an option).
>> >>
>> >> best,
>> >> Colin
>> >>
>> >>
>> >> On Tue, Jan 25, 2022, at 13:34, Jason Gustafson wrote:
>> >> > Hi Colin,
>> >> >
>> >> > Thanks for the writeup. I had one question about bootstrapping. For
>> the
>> >> > brokers, I understand that listener startup is delayed until the
>> >> Authorizer
>> >> > is ready. However, I was not very clear how this would work for the
>> >> > controller listeners. We may need them to startup before the metadata
>> log
>> >> > is ready so that a quorum can be established (as noted in the KIP).
>> This
>> >> > works fine if we assume that the controller principals are among
>> >> > `super.users`. For requests forwarded from brokers, on the other
>> hand, we
>> >> > need to ensure the ACLs have been loaded properly before we begin
>> >> > authorizing. The problem is that we currently use the same listener
>> for
>> >> > quorum requests and for forwarded requests. So my question is how does
>> >> the
>> >> > Authorizer communicate to the controller when it is safe to begin
>> >> > authorizing different request types?
>> >> >
>> >> > There are a couple ways I can see this working. First, we could allow
>> the
>> >> > user to configure the listener used for forwarded requests separately.
>> >> That
>> >> > would work with the existing `Authorizer.start` API. Alternatively,
>> >> perhaps
>> >> > we could modify `Authorizer.start` to work with something more
>> granular
>> >> > than `EndPoint`. This would allow the controller to begin accepting
>> >> > requests from the other quorum members before it is ready to authorize
>> >> > forwarded requests from clients.  Then we would need some way to let
>> >> > brokers know when the controller is ready to accept these forwarded
>> >> > requests (e.g. through an error code in the `Envelope` response).
>> >> >
>> >> > What do you think?
>> >> >
>> >> > Thanks,
>> >> > Jason
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Jan 12, 2022 at 12:57 PM David Arthur
>> >> > <david.art...@confluent.io.invalid> wrote:
>> >> >
>> >> >> +1 binding, thanks Colin!
>> >> >>
>> >> >> On Mon, Dec 13, 2021 at 7:47 PM Colin McCabe <cmcc...@apache.org>
>> >> wrote:
>> >> >>
>> >> >> > Hi all,
>> >> >> >
>> >> >> > I'd like to start the vote on KIP-801: Implement an Authorizer that
>> >> >> stores
>> >> >> > metadata in __cluster_metadata
>> >> >> >
>> >> >> > The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw
>> >> >> >
>> >> >> > The original DISCUSS thread is here:
>> >> >> >
>> >> >> > https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq
>> >> >> >
>> >> >> > Please take a look and vote if you can.
>> >> >> >
>> >> >> > best,
>> >> >> > Colin
>> >> >> >
>> >> >>
>> >> >>
>> >> >> --
>> >> >> -David
>> >> >>
>> >>
>>

Re: [VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

Reply via email to