Thank you for the KIP Colin. In general, I like the idea of having the ability to interact directly with the controllers. I agree with your observation that it helps in situations where you would want to get data directly from the controller instead of going via a broker. I have some general comments but the main concern I have is with the piggy-backing of error code with response of __cluster_metadata topic.
1. With this change, how are we guarding against the possibility of misbehaving client traffic from disrupting the controller (that you mentioned as a motivation of earlier behaviour)? One solution could be to have default values set for request throttling on the controller. 2. This KIP also increases the network attack surface area. Prior to this KIP, it was possible to have firewall rules setup for controllers such that only the brokers can talk to it. But now, the controller would require access from other endpoints other than brokers as well. Can we add a suggestion to the upgrade documentation and inform users 3. In section KRaft Controller MetadataResponse, row 3, "There is no top-level error code in MetadataResponse, so we use the __cluster_metadata topic to send back our error.". This will definitely confuse the users. Can we introduce a top level error field instead? 4. As part of the KIP, could we please add some documentation for users with the suggestion of when to get information directly from the controller and when not to (and associated tradeoffs)? 5. Why do we need the `FromKRaftController` field in the response? What do we expect the users to do with this information? 6. Can we drop `KRaft` from the fields `FromKRaftController` and `DirectToKRaftControllerQuorum`? My suggestion will be to rename it as `DirectToController`. 7. "kafka-metadata-shell.sh is at an "evolving" level of interface stability" -> I thought that with KRaft being production ready, the evolving mode for kraft-related tools has also moved to production. Do we have a timeline when this would move to production? -- Divij Vaidya On Tue, Apr 25, 2023 at 1:38 AM Colin McCabe <cmcc...@apache.org> wrote: > On Fri, Apr 21, 2023, at 14:17, Jason Gustafson wrote: > > Hey Colin, > > > > The KIP makes sense overall. Nice to clarify the contract between clients > > and the controllers. The use of `DirectToKRaftControllerQuorum` will help > > prevent misconfiguration. In fact, I wonder if we can return a fatal > error > > instead of NOT_CONTROLLER so that the client would immediately fail. For > > example, could we use INVALID_REQUEST or something like that? Either that > > or we need to make sure clients treat NOT_CONTROLLER as a fatal error. > > Without that, it would probably get picked up with default retry logic > and > > the user might not see the problem. > > Hi Jason, > > Yes, this is a good point. It should return INVALID_REQUEST since that is > not retryable. I'll change it. > > > > > I also wonder if we can relax the requirements on the Metadata request so > > that we can use it to list topics and partition state (e.g. URPs). It > > would be useful to query the controllers as the metadata source of truth > > when we suspect that the broker metadata may have diverged. > > > > Fair enough. I will document that we can return topics. > > I also added a "future work" section about maybe using the controllers as > bootstrap servers for the broker cluster. To be clear, that is NOT in scope > here, but it's interesting to think about potentially doing in the future. > The major problem is what to do when the broker endpoints we're returning > have different security settings from the controller endpoint the client > initially talked to. > > best, > Colin > > > > > > Thanks, > > Jason > > > > On Thu, Apr 20, 2023 at 5:53 PM Colin McCabe <cmcc...@apache.org> wrote: > > > >> On Wed, Apr 19, 2023, at 20:56, Philip Nee wrote: > >> > Hey Colin, > >> > > >> > I still need to finish reading and understanding the KIP, but I have a > >> > couple of comments despite being ignorant of most of the KRaft stuff. > >> > (Sorry!) > >> > > >> > Firstly, does it make sense to create an extension of the current > >> > AdminClient only to handle these specific KRaft use cases? It seems > >> > cumbersome to have two sets of bootstrap configurations to make the > >> > AdminClient generic enough to handle these specific cases, instead, > maybe > >> > it is more obvious (to me) to just extend the AdminClient. What I'm > >> > thinking is KraftAdminClient which continuously uses > *bootstrap.servers*, > >> > but make this class only serves the Kraft controllers APIs. > >> > >> Hi Philip, > >> > >> Thanks for taking a look. > >> > >> We would not want to create a new Admin client class in order to > >> communicate directly with the controllers. The RPCs accepted by the > >> controllers have the same format as the those accepted by the brokers. > >> There is no difference in what is sent over the wire or the data > structures > >> that are used in the client. > >> > >> I know you mentioned you haven't had time to read all the KRaft stuff > (and > >> there is a lot, I understand). But this is one area that would probably > be > >> clarified if you were able to read at least KIP-500 and KIP-590. RPCs > sent > >> to the broker are forwarded to the controller (mostly) without > modification. > >> > >> > > >> > Secondly, if we want to continue with the design, I'm not yet sure > why we > >> > can't continue using the *bootstrap.servers*? I assume when the client > >> gets > >> > the metadata, it should know who it is talking to. I'm just > reconsidering > >> > your alternative again. > >> > > >> > A bad idea: Why don't we continue using *bootstrap.servers* but have a > >> > separated config like *kraft.controller* = true/false. I feel like > most > >> > users might not know what is a controller and causes some mistakes > down > >> the > >> > road. > >> > > >> > >> Well, you called it a bad idea, and I can't help but agree! :) > >> > >> I think "the user might not know what a controller is" is a good sign > that > >> they shouldn't be interacting with the controller. Like many AdminClient > >> APIs, this one is intended for use by administrators only. Most users > >> indeed should not need to know what a controller is or interact with it > >> directly. If they do interact it should be very clear that they are > doing > >> so. The --controller-bootstrap flag makes it very clear, as does the > >> separate configuration. > >> > >> Let me give an example of the kind of problems that arise if you want to > >> reuse bootstrap.servers for this purpose. > >> > >> If the user grasb a 3.4 Kafka AdminClient and set bootstrap.servers to a > >> set of controller servers, and set direct.to.controller to true, the > >> unknown (in 3.4) configuration will be ignored, and a normal metadata > >> request will be sent without the direct to controller flag. In that > case it > >> will give back an error. Confusing, right? > >> > >> Using controller.servers clarifies this situation because the 3.4 client > >> will not support that config, and will complain about the lack of > >> bootstrap.servers. > >> > >> Actually, the situation could get even more confusing than what I > >> described since some older preproduction versions of the KRaft > controller > >> did implement the METADATA RPC. So if you send them a METADATA request > >> without any special information, it's not clear what you'll get. > Indeed, > >> it would be dependent on the client version and the controller version. > >> > >> The bottom line is that reusing the bootstrap.servers configuration here > >> is not a good idea. > >> > >> best, > >> Colin > >> > >> > Thanks, > >> > P > >> > > >> > On Wed, Apr 19, 2023 at 2:18 PM Colin McCabe <cmcc...@apache.org> > wrote: > >> > > >> >> Hi all, > >> >> > >> >> I wrote a short KIP about allowing AdminClient to talk directly with > the > >> >> KRaft controller quorum. Check it out here: > >> >> > >> >> https://cwiki.apache.org/confluence/x/Owo0Dw > >> >> > >> >> best, > >> >> Colin > >> >> > >> >