Thanks David for the feedback! On Wed, Jul 29, 2020 at 7:53 AM David Jacot <dja...@confluent.io> wrote:
> Hi, Colin, Boyang, > > Colin, thanks for the clarification. Somehow, I thought that even if the > controller is ran independently, it > would still run the listeners of the broker and thus would be accessible by > redirecting on the loopback > interface. My mistake. > > Boyang, I have few questions/comments regarding the updated KIP: > > 1. I think that it would be great if we could clarify how old admin clients > which are directly talking to the > controller will work with this KIP. I read between the lines that, as we > propose to provide a random > broker Id as the controller Id in the metadata response, they will use a > single node as a proxy. Is that > correct? This deserves to be called out more explicitly in the design > section instead of being hidden > in the protocol bump of the metadata RPC. > > Makes sense, I stress this point in the compatibility section. > 1.1 If I understand correctly, could we assume that old admin clients will > stick to the same "fake controller" > until they refresh their metadata? Refreshing the metadata usually happens > when NOT_CONTROLLER > is received but this won't happen anymore so they should change > infrequently. > > That is correct, old admin clients would not try to refresh their metadata due to NOT_CONTROLLER, which is impossible to happen with the new broker cluster. > 2. For the new admin client, I suppose that we plan on using > LeastLoadedNodeProvider for the > requests that are using ControllerNodeProvider. We could perhaps mention > it. > > Sure, added. > 3. Pre KIP-500, will we have a way to distinguish if a request that is > received by the controller is > coming directly from a client or from a broker? You mention that the > listener can be used to do > this but as you pointed out, it is not mandatory. Do we have another > reliable method? I am asking > in the context of KIP-599 with the current controller, we may need to > throttle differently if the > request comes from a client or from a broker. > > The point for using the listener name is more of a security purpose, to detect any forged request to our best effort. For throttling I think we could just check the request header for *InitialClientId* existence, to distinguish whether to apply throttling strategy as forwarded request or direct request. > 4. Could we add `InitialClientId` as well? This will be required for the > quota as we can apply them > by principal and/or clientId. > > Sounds good, added. > 5. A small remark regarding the structure of the KIP. It is a bit weird > that requests that do not go > to the controller are mentioned in the Proposed Design section and the > requests that go to the > controller are mentioned in the Public Interfaces. When one read the > Proposed Design, it does not > get a full picture of the whole new routing proposal for old and new > clients. It would be great if we > could have a full overview in that section. > > Good point, I will move the pieces around. > Overall the change makes sense to me. I will work on drafting an addendum > to KIP-599 to > alter the design to cope with these changes. At a first glance, that seems > doable if 1.1, 3 > and 4 are OK. > > Thank you for the help! > Thanks, > David > > On Wed, Jul 29, 2020 at 5:29 AM Boyang Chen <reluctanthero...@gmail.com> > wrote: > > > Thanks for the feedback Colin! > > > > On Tue, Jul 28, 2020 at 2:11 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > Hi Boyang, > > > > > > Thanks for updating this. A few comments below: > > > > > > In the "Routing Request Security" section, there is a reference to > "older > > > requests that need redirection." But after these new revisions, both > new > > > and old requests need redirection. So we should rephrase this. > > > > > > > In addition, to avoid exposing this forwarding power to the admin > > > clients, > > > > the routing request shall be forwarded towards the controller broker > > > internal > > > > endpoint which should be only visible to other brokers inside the > > > cluster > > > > in the KIP-500 controller. Any admin configuration request with > broker > > > > principal should not be going through the public endpoint and will be > > > > rejected for security purpose. > > > > > > We should also describe how this will work in the pre-KIP-500 case. In > > > that case, CLUSTER_ACTION gets the extra permissions described here > only > > > when the message comes in on the inter-broker listener. We should > state > > > that here. > > > > > > (I can see that you have this information later on in the "Security > > Access > > > Changes" section, but it would be good to have it here as well, to > avoid > > > confusion.) > > > > > > > To be more strict of protecting controller information, the > > > "ControllerId" > > > > field in new MetadataResponse shall be set to -1 when the original > > > request > > > > comes from a non-broker client and it is already on v10. We shall use > > the > > > > request listener name to distinguish whether a given request is > > > inter-broker, > > > > or from the client. > > > > > > I'm not sure why we would need to distinguish between broker clients > and > > > non-broker clients. Brokers don't generally send MetadataRequests to > > other > > > brokers, do they? Brokers learn about metadata from > > UpdateMetadataRequest > > > and LeaderAndIsrRequest, not by sending MetadataRequests to other > > brokers. > > > > > > We do have one use case where the MetadataRequest gets sent between the > > brokers, which is the InterBrokerSendThread. Currently we don't rely on > it > > to get the controller id, so I guess your suggestion should be good to > > enforce. We could use some meta comment on the NetworkClient that it > should > > not be used to get the controller location. > > > > Probably what we want here is: v0-v9: return a random broker in the > cluster > > > as the controller ID. v10: no controllerID present in the > > > MetadataResponse. We should also deprecate the adminClient method > which > > > gets the controllerId. > > > > > > > BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the > > > > request during forwarding, this indicates an internal error on the > > broker > > > > cluster security setup.", BrokerAuthorizationFailureException::new); > > > > > > Grammar nitpick: It would be good to have a period between "forwarding" > > > and "this" to avoid a run-on sentence :) > > > > > > best, > > > Colin > > > > > > > > > On Mon, Jul 27, 2020, at 21:47, Boyang Chen wrote: > > > > Hey there, > > > > > > > > I'm re-opening this thread because after some initial implementations > > > > started, we spotted some gaps in the approved KIP as well as some > > > > inconsistencies with KIP-631 controller. There are a couple of > > addendums > > > to > > > > the existing KIP, specifically: > > > > > > > > 1. As the controller is foreseen to be only accessible to the > brokers, > > > the > > > > new admin client would not have direct access to the controller. It > is > > > > guaranteed on the MetadataResponse level which no longer provides > > > > `ControllerId` to client side requests. > > > > > > > > 2. The broker would forward any direct ZK path mutation requests, > > > including > > > > topic creation/deletion, reassignment, etc since we deprecate the > > direct > > > > controller access on the client side. No more protocol version bump > is > > > > necessary for the configuration requests. > > > > > > > > 3. To make sure forwarding requests pass the authorization, broker > > > > principal CLUSTER_ACTION would be allowed to be used as an > alternative > > > > authentication method for a variety of principal operations, > including > > > > ALTER, ALTER_CONFIG, DELETE, etc. It is because the forwarding > request > > > > needs to use the proxy broker's own principal, which is currently not > > > > supported to be used for many configuration change authentication > > listed > > > > above. The full list could be found in the KIP. > > > > > > > > 4. Add a new BROKER_AUTHORIZATION_FAILURE error code to indicate any > > > > internal security configuration failure, when the forwarded request > > > failed > > > > authentication on the controller side. > > > > > > > > Let me know what you think. With such a major refinement of the KIP, > > I'm > > > > open for re-vote after discussions converge. > > > > > > > > Boyang > > > > > > > > On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen < > reluctanthero...@gmail.com > > > > > > > wrote: > > > > > > > > > Hey folks, > > > > > > > > > > I have also synced on the KIP-578 which was doing the partition > > limit, > > > to > > > > > make sure the partition limit error code would be properly > propagated > > > once > > > > > it is done on top of KIP-590. Let me know if you have further > > > questions or > > > > > concerns. > > > > > > > > > > Boyang > > > > > > > > > > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen < > > > reluctanthero...@gmail.com> > > > > > wrote: > > > > > > > > > >> Thanks for the clarification, Colin and Ismael. Personally I also > > feel > > > > >> Option A is better to prioritize fixing the gap. Just to be clear, > > the > > > > >> proposed solution would be: > > > > >> > > > > >> 1. Bump the Metadata RPC version to return POLICY_VIOLATION. In > the > > > > >> application level, we should swap the error message with the > actual > > > failure > > > > >> reason such as "violation of topic creation policy when attempting > > to > > > auto > > > > >> create internal topic through MetadataRequest." > > > > >> > > > > >> 2. For older Metadata RPC, return AUTHORIZATION_FAILED to fail > fast. > > > > >> > > > > >> Will address our other discussed points as well in the KIP, let me > > > know > > > > >> if you have further questions. > > > > >> > > > > >> Thanks, > > > > >> Boyang > > > > >> > > > > >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <ism...@juma.me.uk> > > > wrote: > > > > >> > > > > >>> Option A is basically what I was thinking. But with a slight > > > adjustment: > > > > >>> > > > > >>> New versions of MetadataResponse return POLICY_VIOLATION, old > > > versions > > > > >>> return AUTHORIZATION_FAILED. The latter works correctly with old > > Java > > > > >>> clients (i.e. the client fails fast and propagates the error), > I've > > > > >>> tested > > > > >>> it. Adjust new clients to treat POLICY_VIOLATION like > > > > >>> AUTHORIZATION_FAILED, > > > > >>> but propagate the custom error message. > > > > >>> > > > > >>> Ismael > > > > >>> > > > > >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe < > cmcc...@apache.org> > > > > >>> wrote: > > > > >>> > > > > >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael Juma < > > > ism...@juma.me.uk> > > > > >>> > wrote: > > > > >>> > > > > > > > > >>> > > > > > Hi Colin, > > > > >>> > > > > > > > > > >>> > > > > > The KIP states in the Compatibility section (not Future > > > work): > > > > >>> > > > > > > > > > >>> > > > > > "To support the proxy of requests, we need to build a > > > channel > > > > >>> for > > > > >>> > > > > > brokers to talk directly to the controller. This part > of > > > the > > > > >>> design > > > > >>> > > > > > is internal change only and won’t block the KIP > > progress." > > > > >>> > > > > > > > > > >>> > > > > > I am clarifying that this is not internal only due to > the > > > > >>> config. > > > > >>> > If we > > > > >>> > > > > > say that this KIP depends on another KIP before we can > > > merge > > > > >>> > > > > > it, that's fine although it feels a bit unnecessary. > > > > >>> > > > > > > > > > >>> > > > > > >>> > Hi Ismael, > > > > >>> > > > > > >>> > I didn't realize there was still a reference to the separate > > > controller > > > > >>> > channel in the "Compatibility, Deprecation, and Migration Plan" > > > > >>> section. I > > > > >>> > agree that it doesn't really belong there. Given that this is > > > creating > > > > >>> > confusion, I would suggest that we just drop this from the KIP > > > > >>> entirely. > > > > >>> > It really is orthogonal to what this KIP is about-- we don't > > need a > > > > >>> > separate channel to implement redirection. > > > > >>> > > > > > >>> > Boyang wrote: > > > > >>> > > > > > >>> > > > > > > >>> > > We are only opening the doors for specific internal topics > > > (offsets, > > > > >>> txn > > > > >>> > > log), which I assume the client should have no possibility to > > > mutate > > > > >>> the > > > > >>> > > topic policy? > > > > >>> > > > > > > >>> > > > > > >>> > Hi Boyang, > > > > >>> > > > > > >>> > I think you and Ismael are talking about different scenarios. > > You > > > are > > > > >>> > describing the scenario where the broker is auto-creating the > > > > >>> transaction > > > > >>> > log topic or consumer offset topic. This scenario indeed > should > > > not > > > > >>> happen > > > > >>> > in a properly-configured cluster. However, Ismael is > describing > > a > > > > >>> scenario > > > > >>> > where the client is auto-creating some arbitrary non-internal > > topic > > > > >>> just by > > > > >>> > sending a metadata request. > > > > >>> > > > > > >>> > As far as I can see, there are two solutions here: > > > > >>> > > > > > >>> > A. Close the hole in CreateTopicsPolicy immediately. In new > > > versions, > > > > >>> > allow MetadataResponse to return AUTHORIZATION_FAILED if we > tried > > > to > > > > >>> > auto-create a topic and failed. Find some other error code to > > > return > > > > >>> for > > > > >>> > existing versions. > > > > >>> > > > > > >>> > B. Keep the hole in CreateTopicsPolicy and add some > configuration > > > to > > > > >>> allow > > > > >>> > admins to gradually migrate to closing it. In practice, this > > > probably > > > > >>> > means a configuration toggle that enables direct ZK access, > that > > > > >>> starts off > > > > >>> > as enabled. Then we can eventually default it to false and > then > > > > >>> remove it > > > > >>> > entirely over time. > > > > >>> > > > > > >>> > best, > > > > >>> > Colin > > > > >>> > > > > > >>> > > > > >> > > > > > > > > > >