Re: [VOTE] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

David Jacot Thu, 30 Jul 2020 07:17:13 -0700

Hi Boyang,

Thanks for your answers.


> The point for using the listener name is more of a security purpose, to
> detect any forged request to our best effort.
> For throttling I think we could just check the request header for
> *InitialClientId* existence, to distinguish whether to apply
> throttling strategy as forwarded request or direct request.

Reading "security" and "best effort" in the same sentence makes me a
little nervous :).

The identification issue is also valid for quota as we don't want one to be
able to bypass the quota by forging a request as well, isn't it? Otherwise,
anyone could just set the InitialPrincipal to bypass it. I think that we
should
only use InitialPrincipal and/or InitialClientId when we know that they come
from another broker. Based on what I read in the KIP, it looks like we could
only use them when the principal has CLUSTER_ACTION privilege. Do I
understand it correctly?

I have made another pass on the whole KIP, I have few nits:

- The sentence "Take AlterConfig as an example to understand the changes
we are making." does not make much sense anymore in the beginning of the
"Proposed Changes" chapter.

- When you say "Existing RPCs which are sending directly to the controller
will
rely on forwarding as well.". I suggest to explicitly mention how "old
admin clients"
will work here to complement the sentence. Something like: They will get a
random
broker id as the controller id in the metadata response and stick to it as
you explained.

- "The purpose of adding principal name is for the audit logging, and the
client id is
being used to throttling according to KIP-599 requirement." Actually,
KIP-599 needs
both the principal and the clientId.

- In the "Routing Request Security" chapter. It is written that the
forwarding broker
will verify the request with its own authorizer and will just forward it if
the request
looks good. When a request contains for instance multiple topics, I suppose
that
we will forward only the authorized ones and not the whole original request
as is.
We may want to reword the sentence to make this clear.

- For the record, should we put the previous proposal in the rejected
alternatives as
well?

Best,
David

On Thu, Jul 30, 2020 at 3:51 AM Boyang Chen <reluctanthero...@gmail.com>
wrote:

> Thanks David for the feedback!
>
> On Wed, Jul 29, 2020 at 7:53 AM David Jacot <dja...@confluent.io> wrote:
>
> > Hi, Colin, Boyang,
> >
> > Colin, thanks for the clarification. Somehow, I thought that even if the
> > controller is ran independently, it
> > would still run the listeners of the broker and thus would be accessible
> by
> > redirecting on the loopback
> > interface. My mistake.
> >
> > Boyang, I have few questions/comments regarding the updated KIP:
> >
> > 1. I think that it would be great if we could clarify how old admin
> clients
> > which are directly talking to the
> > controller will work with this KIP. I read between the lines that, as we
> > propose to provide a random
> > broker Id as the controller Id in the metadata response, they will use a
> > single node as a proxy. Is that
> > correct? This deserves to be called out more explicitly in the design
> > section instead of being hidden
> > in the protocol bump of the metadata RPC.
> >
> > Makes sense, I stress this point in the compatibility section.
>
>
> > 1.1 If I understand correctly, could we assume that old admin clients
> will
> > stick to the same "fake controller"
> > until they refresh their metadata? Refreshing the metadata usually
> happens
> > when NOT_CONTROLLER
> > is received but this won't happen anymore so they should change
> > infrequently.
> >
> > That is correct, old admin clients would not try to refresh their
> metadata
> due to NOT_CONTROLLER,
> which is impossible to happen with the new broker cluster.
>
>
> > 2. For the new admin client, I suppose that we plan on using
> > LeastLoadedNodeProvider for the
> > requests that are using ControllerNodeProvider. We could perhaps mention
> > it.
> >
> > Sure, added.
>
>
> > 3. Pre KIP-500, will we have a way to distinguish if a request that is
> > received by the controller is
> > coming directly from a client or from a broker? You mention that the
> > listener can be used to do
> > this but as you pointed out, it is not mandatory. Do we have another
> > reliable method? I am asking
> > in the context of KIP-599 with the current controller, we may need to
> > throttle differently if the
> > request comes from a client or from a broker.
> >
> > The point for using the listener name is more of a security purpose, to
> detect any forged request to our best effort.
> For throttling I think we could just check the request header for
> *InitialClientId* existence, to distinguish whether to apply
> throttling strategy as forwarded request or direct request.
>
>
> > 4. Could we add `InitialClientId` as well? This will be required for the
> > quota as we can apply them
> > by principal and/or clientId.
> >
> > Sounds good, added.
>
>
> > 5. A small remark regarding the structure of the KIP. It is a bit weird
> > that requests that do not go
> > to the controller are mentioned in the Proposed Design section and the
> > requests that go to the
> > controller are mentioned in the Public Interfaces. When one read the
> > Proposed Design, it does not
> > get a full picture of the whole new routing proposal for old and new
> > clients. It would be great if we
> > could have a full overview in that section.
> >
> > Good point, I will move the pieces around.
>
>
> > Overall the change makes sense to me. I will work on drafting an addendum
> > to KIP-599 to
> > alter the design to cope with these changes. At a first glance, that
> seems
> > doable if 1.1, 3
> > and 4 are OK.
> >
> > Thank you for the help!
>
>
> > Thanks,
> > David
> >
> > On Wed, Jul 29, 2020 at 5:29 AM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> > > Thanks for the feedback Colin!
> > >
> > > On Tue, Jul 28, 2020 at 2:11 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >
> > > > Hi Boyang,
> > > >
> > > > Thanks for updating this.  A few comments below:
> > > >
> > > > In the "Routing Request Security" section, there is a reference to
> > "older
> > > > requests that need redirection."  But after these new revisions, both
> > new
> > > > and old requests need redirection.  So we should rephrase this.
> > > >
> > > > > In addition, to avoid exposing this forwarding power to the admin
> > > > clients,
> > > > > the routing request shall be forwarded towards the controller
> broker
> > > > internal
> > > > > endpoint which should be only visible to other brokers inside the
> > > > cluster
> > > > > in the KIP-500 controller. Any admin configuration request with
> > broker
> > > > > principal should not be going through the public endpoint and will
> be
> > > > > rejected for security purpose.
> > > >
> > > > We should also describe how this will work in the pre-KIP-500 case.
> In
> > > > that case, CLUSTER_ACTION gets the extra permissions described here
> > only
> > > > when the message comes in on the inter-broker listener.  We should
> > state
> > > > that here.
> > > >
> > > > (I can see that you have this information later on in the "Security
> > > Access
> > > > Changes" section, but it would be good to have it here as well, to
> > avoid
> > > > confusion.)
> > > >
> > > > > To be more strict of protecting controller information, the
> > > > "ControllerId"
> > > > > field in new MetadataResponse shall be set to -1 when the original
> > > > request
> > > > > comes from a non-broker client and it is already on v10. We shall
> use
> > > the
> > > > > request listener name to distinguish whether a given request is
> > > > inter-broker,
> > > > > or from the client.
> > > >
> > > > I'm not sure why we would need to distinguish between broker clients
> > and
> > > > non-broker clients.  Brokers don't generally send MetadataRequests to
> > > other
> > > > brokers, do they?  Brokers learn about metadata from
> > > UpdateMetadataRequest
> > > > and LeaderAndIsrRequest, not by sending MetadataRequests to other
> > > brokers.
> > > >
> > > > We do have one use case where the MetadataRequest gets sent between
> the
> > > brokers, which is the InterBrokerSendThread. Currently we don't rely on
> > it
> > > to get the controller id, so I guess your suggestion should be good to
> > > enforce. We could use some meta comment on the NetworkClient that it
> > should
> > > not be used to get the controller location.
> > >
> > > Probably what we want here is: v0-v9: return a random broker in the
> > cluster
> > > > as the controller ID.  v10: no controllerID present in the
> > > > MetadataResponse.  We should also deprecate the adminClient method
> > which
> > > > gets the controllerId.
> > > >
> > > > > BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the
> > > > > request during forwarding, this indicates an internal error on the
> > > broker
> > > > > cluster security setup.",
> BrokerAuthorizationFailureException::new);
> > > >
> > > > Grammar nitpick: It would be good to have a period between
> "forwarding"
> > > > and "this" to avoid a run-on sentence :)
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Mon, Jul 27, 2020, at 21:47, Boyang Chen wrote:
> > > > > Hey there,
> > > > >
> > > > > I'm re-opening this thread because after some initial
> implementations
> > > > > started, we spotted some gaps in the approved KIP as well as some
> > > > > inconsistencies with KIP-631 controller. There are a couple of
> > > addendums
> > > > to
> > > > > the existing KIP, specifically:
> > > > >
> > > > > 1. As the controller is foreseen to be only accessible to the
> > brokers,
> > > > the
> > > > > new admin client would not have direct access to the controller. It
> > is
> > > > > guaranteed on the MetadataResponse level which no longer provides
> > > > > `ControllerId` to client side requests.
> > > > >
> > > > > 2. The broker would forward any direct ZK path mutation requests,
> > > > including
> > > > > topic creation/deletion, reassignment, etc since we deprecate the
> > > direct
> > > > > controller access on the client side. No more protocol version bump
> > is
> > > > > necessary for the configuration requests.
> > > > >
> > > > > 3. To make sure forwarding requests pass the authorization, broker
> > > > > principal CLUSTER_ACTION would be allowed to be used as an
> > alternative
> > > > > authentication method for a variety of principal operations,
> > including
> > > > > ALTER, ALTER_CONFIG, DELETE, etc. It is because the forwarding
> > request
> > > > > needs to use the proxy broker's own principal, which is currently
> not
> > > > > supported to be used for many configuration change authentication
> > > listed
> > > > > above. The full list could be found in the KIP.
> > > > >
> > > > > 4. Add a new BROKER_AUTHORIZATION_FAILURE error code to indicate
> any
> > > > > internal security configuration failure, when the forwarded request
> > > > failed
> > > > > authentication on the controller side.
> > > > >
> > > > > Let me know what you think. With such a major refinement of the
> KIP,
> > > I'm
> > > > > open for re-vote after discussions converge.
> > > > >
> > > > > Boyang
> > > > >
> > > > > On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen <
> > reluctanthero...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hey folks,
> > > > > >
> > > > > > I have also synced on the KIP-578 which was doing the partition
> > > limit,
> > > > to
> > > > > > make sure the partition limit error code would be properly
> > propagated
> > > > once
> > > > > > it is done on top of KIP-590. Let me know if you have further
> > > > questions or
> > > > > > concerns.
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen <
> > > > reluctanthero...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Thanks for the clarification, Colin and Ismael. Personally I
> also
> > > feel
> > > > > >> Option A is better to prioritize fixing the gap. Just to be
> clear,
> > > the
> > > > > >> proposed solution would be:
> > > > > >>
> > > > > >> 1. Bump the Metadata RPC version to return POLICY_VIOLATION. In
> > the
> > > > > >> application level, we should swap the error message with the
> > actual
> > > > failure
> > > > > >> reason such as "violation of topic creation policy when
> attempting
> > > to
> > > > auto
> > > > > >> create internal topic through MetadataRequest."
> > > > > >>
> > > > > >> 2. For older Metadata RPC, return AUTHORIZATION_FAILED to fail
> > fast.
> > > > > >>
> > > > > >> Will address our other discussed points as well in the KIP, let
> me
> > > > know
> > > > > >> if you have further questions.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Boyang
> > > > > >>
> > > > > >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <ism...@juma.me.uk
> >
> > > > wrote:
> > > > > >>
> > > > > >>> Option A is basically what I was thinking. But with a slight
> > > > adjustment:
> > > > > >>>
> > > > > >>> New versions of MetadataResponse return POLICY_VIOLATION, old
> > > > versions
> > > > > >>> return AUTHORIZATION_FAILED. The latter works correctly with
> old
> > > Java
> > > > > >>> clients (i.e. the client fails fast and propagates the error),
> > I've
> > > > > >>> tested
> > > > > >>> it. Adjust new clients to treat POLICY_VIOLATION like
> > > > > >>> AUTHORIZATION_FAILED,
> > > > > >>> but propagate the custom error message.
> > > > > >>>
> > > > > >>> Ismael
> > > > > >>>
> > > > > >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe <
> > cmcc...@apache.org>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael Juma <
> > > > ism...@juma.me.uk>
> > > > > >>> > wrote:
> > > > > >>> > > > >
> > > > > >>> > > > > > Hi Colin,
> > > > > >>> > > > > >
> > > > > >>> > > > > > The KIP states in the Compatibility section (not
> Future
> > > > work):
> > > > > >>> > > > > >
> > > > > >>> > > > > > "To support the proxy of requests, we need to build a
> > > > channel
> > > > > >>> for
> > > > > >>> > > > > > brokers to talk directly to the controller. This part
> > of
> > > > the
> > > > > >>> design
> > > > > >>> > > > > > is internal change only and won’t block the KIP
> > > progress."
> > > > > >>> > > > > >
> > > > > >>> > > > > > I am clarifying that this is not internal only due to
> > the
> > > > > >>> config.
> > > > > >>> > If we
> > > > > >>> > > > > > say that this KIP depends on another KIP before we
> can
> > > > merge
> > > > > >>> > > > > > it, that's fine although it feels a bit unnecessary.
> > > > > >>> > > > > >
> > > > > >>> >
> > > > > >>> > Hi Ismael,
> > > > > >>> >
> > > > > >>> > I didn't realize there was still a reference to the separate
> > > > controller
> > > > > >>> > channel in the "Compatibility, Deprecation, and Migration
> Plan"
> > > > > >>> section.  I
> > > > > >>> > agree that it doesn't really belong there.  Given that this
> is
> > > > creating
> > > > > >>> > confusion, I would suggest that we just drop this from the
> KIP
> > > > > >>> entirely.
> > > > > >>> > It really is orthogonal to what this KIP is about-- we don't
> > > need a
> > > > > >>> > separate channel to implement redirection.
> > > > > >>> >
> > > > > >>> > Boyang wrote:
> > > > > >>> >
> > > > > >>> > >
> > > > > >>> > > We are only opening the doors for specific internal topics
> > > > (offsets,
> > > > > >>> txn
> > > > > >>> > > log), which I assume the client should have no possibility
> to
> > > > mutate
> > > > > >>> the
> > > > > >>> > > topic policy?
> > > > > >>> > >
> > > > > >>> >
> > > > > >>> > Hi Boyang,
> > > > > >>> >
> > > > > >>> > I think you and Ismael are talking about different scenarios.
> > > You
> > > > are
> > > > > >>> > describing the scenario where the broker is auto-creating the
> > > > > >>> transaction
> > > > > >>> > log topic or consumer offset topic.  This scenario indeed
> > should
> > > > not
> > > > > >>> happen
> > > > > >>> > in a properly-configured cluster.  However, Ismael is
> > describing
> > > a
> > > > > >>> scenario
> > > > > >>> > where the client is auto-creating some arbitrary non-internal
> > > topic
> > > > > >>> just by
> > > > > >>> > sending a metadata request.
> > > > > >>> >
> > > > > >>> > As far as I can see, there are two solutions here:
> > > > > >>> >
> > > > > >>> > A. Close the hole in CreateTopicsPolicy immediately.  In new
> > > > versions,
> > > > > >>> > allow MetadataResponse to return AUTHORIZATION_FAILED if we
> > tried
> > > > to
> > > > > >>> > auto-create a topic and failed.  Find some other error code
> to
> > > > return
> > > > > >>> for
> > > > > >>> > existing versions.
> > > > > >>> >
> > > > > >>> > B. Keep the hole in CreateTopicsPolicy and add some
> > configuration
> > > > to
> > > > > >>> allow
> > > > > >>> > admins to gradually migrate to closing it.  In practice, this
> > > > probably
> > > > > >>> > means a configuration toggle that enables direct ZK access,
> > that
> > > > > >>> starts off
> > > > > >>> > as enabled.  Then we can eventually default it to false and
> > then
> > > > > >>> remove it
> > > > > >>> > entirely over time.
> > > > > >>> >
> > > > > >>> > best,
> > > > > >>> > Colin
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

Reply via email to