Thanks David for the feedback!

On Wed, Jul 29, 2020 at 7:53 AM David Jacot <dja...@confluent.io> wrote:

> Hi, Colin, Boyang,
>
> Colin, thanks for the clarification. Somehow, I thought that even if the
> controller is ran independently, it
> would still run the listeners of the broker and thus would be accessible by
> redirecting on the loopback
> interface. My mistake.
>
> Boyang, I have few questions/comments regarding the updated KIP:
>
> 1. I think that it would be great if we could clarify how old admin clients
> which are directly talking to the
> controller will work with this KIP. I read between the lines that, as we
> propose to provide a random
> broker Id as the controller Id in the metadata response, they will use a
> single node as a proxy. Is that
> correct? This deserves to be called out more explicitly in the design
> section instead of being hidden
> in the protocol bump of the metadata RPC.
>
> Makes sense, I stress this point in the compatibility section.


> 1.1 If I understand correctly, could we assume that old admin clients will
> stick to the same "fake controller"
> until they refresh their metadata? Refreshing the metadata usually happens
> when NOT_CONTROLLER
> is received but this won't happen anymore so they should change
> infrequently.
>
> That is correct, old admin clients would not try to refresh their metadata
due to NOT_CONTROLLER,
which is impossible to happen with the new broker cluster.


> 2. For the new admin client, I suppose that we plan on using
> LeastLoadedNodeProvider for the
> requests that are using ControllerNodeProvider. We could perhaps mention
> it.
>
> Sure, added.


> 3. Pre KIP-500, will we have a way to distinguish if a request that is
> received by the controller is
> coming directly from a client or from a broker? You mention that the
> listener can be used to do
> this but as you pointed out, it is not mandatory. Do we have another
> reliable method? I am asking
> in the context of KIP-599 with the current controller, we may need to
> throttle differently if the
> request comes from a client or from a broker.
>
> The point for using the listener name is more of a security purpose, to
detect any forged request to our best effort.
For throttling I think we could just check the request header for
*InitialClientId* existence, to distinguish whether to apply
throttling strategy as forwarded request or direct request.


> 4. Could we add `InitialClientId` as well? This will be required for the
> quota as we can apply them
> by principal and/or clientId.
>
> Sounds good, added.


> 5. A small remark regarding the structure of the KIP. It is a bit weird
> that requests that do not go
> to the controller are mentioned in the Proposed Design section and the
> requests that go to the
> controller are mentioned in the Public Interfaces. When one read the
> Proposed Design, it does not
> get a full picture of the whole new routing proposal for old and new
> clients. It would be great if we
> could have a full overview in that section.
>
> Good point, I will move the pieces around.


> Overall the change makes sense to me. I will work on drafting an addendum
> to KIP-599 to
> alter the design to cope with these changes. At a first glance, that seems
> doable if 1.1, 3
> and 4 are OK.
>
> Thank you for the help!


> Thanks,
> David
>
> On Wed, Jul 29, 2020 at 5:29 AM Boyang Chen <reluctanthero...@gmail.com>
> wrote:
>
> > Thanks for the feedback Colin!
> >
> > On Tue, Jul 28, 2020 at 2:11 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > Hi Boyang,
> > >
> > > Thanks for updating this.  A few comments below:
> > >
> > > In the "Routing Request Security" section, there is a reference to
> "older
> > > requests that need redirection."  But after these new revisions, both
> new
> > > and old requests need redirection.  So we should rephrase this.
> > >
> > > > In addition, to avoid exposing this forwarding power to the admin
> > > clients,
> > > > the routing request shall be forwarded towards the controller broker
> > > internal
> > > > endpoint which should be only visible to other brokers inside the
> > > cluster
> > > > in the KIP-500 controller. Any admin configuration request with
> broker
> > > > principal should not be going through the public endpoint and will be
> > > > rejected for security purpose.
> > >
> > > We should also describe how this will work in the pre-KIP-500 case.  In
> > > that case, CLUSTER_ACTION gets the extra permissions described here
> only
> > > when the message comes in on the inter-broker listener.  We should
> state
> > > that here.
> > >
> > > (I can see that you have this information later on in the "Security
> > Access
> > > Changes" section, but it would be good to have it here as well, to
> avoid
> > > confusion.)
> > >
> > > > To be more strict of protecting controller information, the
> > > "ControllerId"
> > > > field in new MetadataResponse shall be set to -1 when the original
> > > request
> > > > comes from a non-broker client and it is already on v10. We shall use
> > the
> > > > request listener name to distinguish whether a given request is
> > > inter-broker,
> > > > or from the client.
> > >
> > > I'm not sure why we would need to distinguish between broker clients
> and
> > > non-broker clients.  Brokers don't generally send MetadataRequests to
> > other
> > > brokers, do they?  Brokers learn about metadata from
> > UpdateMetadataRequest
> > > and LeaderAndIsrRequest, not by sending MetadataRequests to other
> > brokers.
> > >
> > > We do have one use case where the MetadataRequest gets sent between the
> > brokers, which is the InterBrokerSendThread. Currently we don't rely on
> it
> > to get the controller id, so I guess your suggestion should be good to
> > enforce. We could use some meta comment on the NetworkClient that it
> should
> > not be used to get the controller location.
> >
> > Probably what we want here is: v0-v9: return a random broker in the
> cluster
> > > as the controller ID.  v10: no controllerID present in the
> > > MetadataResponse.  We should also deprecate the adminClient method
> which
> > > gets the controllerId.
> > >
> > > > BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the
> > > > request during forwarding, this indicates an internal error on the
> > broker
> > > > cluster security setup.", BrokerAuthorizationFailureException::new);
> > >
> > > Grammar nitpick: It would be good to have a period between "forwarding"
> > > and "this" to avoid a run-on sentence :)
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Jul 27, 2020, at 21:47, Boyang Chen wrote:
> > > > Hey there,
> > > >
> > > > I'm re-opening this thread because after some initial implementations
> > > > started, we spotted some gaps in the approved KIP as well as some
> > > > inconsistencies with KIP-631 controller. There are a couple of
> > addendums
> > > to
> > > > the existing KIP, specifically:
> > > >
> > > > 1. As the controller is foreseen to be only accessible to the
> brokers,
> > > the
> > > > new admin client would not have direct access to the controller. It
> is
> > > > guaranteed on the MetadataResponse level which no longer provides
> > > > `ControllerId` to client side requests.
> > > >
> > > > 2. The broker would forward any direct ZK path mutation requests,
> > > including
> > > > topic creation/deletion, reassignment, etc since we deprecate the
> > direct
> > > > controller access on the client side. No more protocol version bump
> is
> > > > necessary for the configuration requests.
> > > >
> > > > 3. To make sure forwarding requests pass the authorization, broker
> > > > principal CLUSTER_ACTION would be allowed to be used as an
> alternative
> > > > authentication method for a variety of principal operations,
> including
> > > > ALTER, ALTER_CONFIG, DELETE, etc. It is because the forwarding
> request
> > > > needs to use the proxy broker's own principal, which is currently not
> > > > supported to be used for many configuration change authentication
> > listed
> > > > above. The full list could be found in the KIP.
> > > >
> > > > 4. Add a new BROKER_AUTHORIZATION_FAILURE error code to indicate any
> > > > internal security configuration failure, when the forwarded request
> > > failed
> > > > authentication on the controller side.
> > > >
> > > > Let me know what you think. With such a major refinement of the KIP,
> > I'm
> > > > open for re-vote after discussions converge.
> > > >
> > > > Boyang
> > > >
> > > > On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen <
> reluctanthero...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hey folks,
> > > > >
> > > > > I have also synced on the KIP-578 which was doing the partition
> > limit,
> > > to
> > > > > make sure the partition limit error code would be properly
> propagated
> > > once
> > > > > it is done on top of KIP-590. Let me know if you have further
> > > questions or
> > > > > concerns.
> > > > >
> > > > > Boyang
> > > > >
> > > > > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen <
> > > reluctanthero...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Thanks for the clarification, Colin and Ismael. Personally I also
> > feel
> > > > >> Option A is better to prioritize fixing the gap. Just to be clear,
> > the
> > > > >> proposed solution would be:
> > > > >>
> > > > >> 1. Bump the Metadata RPC version to return POLICY_VIOLATION. In
> the
> > > > >> application level, we should swap the error message with the
> actual
> > > failure
> > > > >> reason such as "violation of topic creation policy when attempting
> > to
> > > auto
> > > > >> create internal topic through MetadataRequest."
> > > > >>
> > > > >> 2. For older Metadata RPC, return AUTHORIZATION_FAILED to fail
> fast.
> > > > >>
> > > > >> Will address our other discussed points as well in the KIP, let me
> > > know
> > > > >> if you have further questions.
> > > > >>
> > > > >> Thanks,
> > > > >> Boyang
> > > > >>
> > > > >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <ism...@juma.me.uk>
> > > wrote:
> > > > >>
> > > > >>> Option A is basically what I was thinking. But with a slight
> > > adjustment:
> > > > >>>
> > > > >>> New versions of MetadataResponse return POLICY_VIOLATION, old
> > > versions
> > > > >>> return AUTHORIZATION_FAILED. The latter works correctly with old
> > Java
> > > > >>> clients (i.e. the client fails fast and propagates the error),
> I've
> > > > >>> tested
> > > > >>> it. Adjust new clients to treat POLICY_VIOLATION like
> > > > >>> AUTHORIZATION_FAILED,
> > > > >>> but propagate the custom error message.
> > > > >>>
> > > > >>> Ismael
> > > > >>>
> > > > >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe <
> cmcc...@apache.org>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael Juma <
> > > ism...@juma.me.uk>
> > > > >>> > wrote:
> > > > >>> > > > >
> > > > >>> > > > > > Hi Colin,
> > > > >>> > > > > >
> > > > >>> > > > > > The KIP states in the Compatibility section (not Future
> > > work):
> > > > >>> > > > > >
> > > > >>> > > > > > "To support the proxy of requests, we need to build a
> > > channel
> > > > >>> for
> > > > >>> > > > > > brokers to talk directly to the controller. This part
> of
> > > the
> > > > >>> design
> > > > >>> > > > > > is internal change only and won’t block the KIP
> > progress."
> > > > >>> > > > > >
> > > > >>> > > > > > I am clarifying that this is not internal only due to
> the
> > > > >>> config.
> > > > >>> > If we
> > > > >>> > > > > > say that this KIP depends on another KIP before we can
> > > merge
> > > > >>> > > > > > it, that's fine although it feels a bit unnecessary.
> > > > >>> > > > > >
> > > > >>> >
> > > > >>> > Hi Ismael,
> > > > >>> >
> > > > >>> > I didn't realize there was still a reference to the separate
> > > controller
> > > > >>> > channel in the "Compatibility, Deprecation, and Migration Plan"
> > > > >>> section.  I
> > > > >>> > agree that it doesn't really belong there.  Given that this is
> > > creating
> > > > >>> > confusion, I would suggest that we just drop this from the KIP
> > > > >>> entirely.
> > > > >>> > It really is orthogonal to what this KIP is about-- we don't
> > need a
> > > > >>> > separate channel to implement redirection.
> > > > >>> >
> > > > >>> > Boyang wrote:
> > > > >>> >
> > > > >>> > >
> > > > >>> > > We are only opening the doors for specific internal topics
> > > (offsets,
> > > > >>> txn
> > > > >>> > > log), which I assume the client should have no possibility to
> > > mutate
> > > > >>> the
> > > > >>> > > topic policy?
> > > > >>> > >
> > > > >>> >
> > > > >>> > Hi Boyang,
> > > > >>> >
> > > > >>> > I think you and Ismael are talking about different scenarios.
> > You
> > > are
> > > > >>> > describing the scenario where the broker is auto-creating the
> > > > >>> transaction
> > > > >>> > log topic or consumer offset topic.  This scenario indeed
> should
> > > not
> > > > >>> happen
> > > > >>> > in a properly-configured cluster.  However, Ismael is
> describing
> > a
> > > > >>> scenario
> > > > >>> > where the client is auto-creating some arbitrary non-internal
> > topic
> > > > >>> just by
> > > > >>> > sending a metadata request.
> > > > >>> >
> > > > >>> > As far as I can see, there are two solutions here:
> > > > >>> >
> > > > >>> > A. Close the hole in CreateTopicsPolicy immediately.  In new
> > > versions,
> > > > >>> > allow MetadataResponse to return AUTHORIZATION_FAILED if we
> tried
> > > to
> > > > >>> > auto-create a topic and failed.  Find some other error code to
> > > return
> > > > >>> for
> > > > >>> > existing versions.
> > > > >>> >
> > > > >>> > B. Keep the hole in CreateTopicsPolicy and add some
> configuration
> > > to
> > > > >>> allow
> > > > >>> > admins to gradually migrate to closing it.  In practice, this
> > > probably
> > > > >>> > means a configuration toggle that enables direct ZK access,
> that
> > > > >>> starts off
> > > > >>> > as enabled.  Then we can eventually default it to false and
> then
> > > > >>> remove it
> > > > >>> > entirely over time.
> > > > >>> >
> > > > >>> > best,
> > > > >>> > Colin
> > > > >>> >
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Reply via email to