On Tue, Sep 29, 2020, at 17:43, Jason Gustafson wrote: > Hey Colin, > > Thanks for the hard work on this proposal. > > I'm gradually coming over to the idea of the controllers having separate > IDs. One of the benefits is that it allows us to separate the notion of > controller liveness from broker liveness, which has always been a tricky > detail. I think it's fair to say that the liveness of the controller is > tracked through the Raft protocol while the liveness of the broker is > tracked by the heartbeating mechanism in this KIP. This saves from having > to deal with tricky cases like electing a controller which is not actually > live from the perspective of heartbeating. I suspect you are right that it > will be simpler internally if we treat them distinctly, though it will take > some getting used to for users. It also seemingly makes it easier to > migrate to a dedicated controller setup once a cluster gets large enough > for it. > > With that said, one detail which is not very clear to me is how these two > IDs interact with the metadata quorum. Let's say we have a broker which is > running as both a broker and a controller. I think it must send fetches > using `controller.id` so that the leader can track its progress and it can > be eligible for election. Will it also fetch using `broker.id`? If we could > avoid replicating the metadata twice, that would be preferable, but it > seems like that will break the isolation between the controller and broker > at some level. >
Hi Jason, Thanks for taking a look. I agree that it would be good to avoid replicating the metadata twice on disk. This is one place where I think it's OK to break the isolation between the controller and the broker. This is also one benefit of separating the metadata fetches from the heartbeats -- a broker which is co-located with a controller will still make broker heartbeats to the controller quorum, but won't perform metadata fetches... since the controller part of the process is already doing that and sharing the results... > > There is another option I was thinking about for the sake of discussion. > Suppose we say that controllers always persist the metadata log and brokers > never do. A broker would always start from the latest snapshot, but we can > allow it to fetch from the "nearest" controller (which might be local if > `process.roles` is set to both controller and broker). If users want to > have the metadata log replicated to all nodes, then they can make each node > a controller. It's fine to have controllers that are not voters since they > can be observers. They replicate and persist the metadata log, but they > do not take part in elections, though they would be available for observer > promotion when we get around to completing the raft reassignment work. > That's an interesting idea. However, I don't think you would want every node to maintain the controller data structures in memory. Those data structures are optimized for taking over as the active controller if necessary. That's different than the MetadataCache on the brokers, which is optimized for being accessed concurrently by lots of request handler threads. > > On a related note, I'm still slightly in favor of unifying the controller > listener into the existing `listeners` configuration. I think we would > agree that separating the controller listener should at least be considered > a best practice in a secure configuration, but I am not sure about the case > for mandating it. I'm sympathetic to the idea that we should be opinionated > about this, but it would be helpful if the KIP documented what exactly we > are getting out of it. My concern is basically that it hurts usability. > I don't think having separate ports hurts usability. Anyone using a quickstart docker image or default configuration can just leave this as the default, which will be something like 0.0.0.0:9093. Users are already setting controller.id and node.roles differently on controller nodes in any case. Sharing the same port doesn't really work. You wouldn't know whether anyone contacting wanted to connect to the broker or the controller. Then you can't have tighter security on the controller than on the broker (or indeed any security differences on the controller vs. the broker). And so on, and so on, for metrics, backpressure, quotas, etc. There is also the issue with clients flooding the controller port. There's no reason to expose ourselves to that. It's just a trap for novice admins, which is actually a negative for usability, I think. > > By the way, in KIP-595 we tended to favor `quorum` as the prefix for > configurations related to the management of the metadata quorum. None of > these configs have been exposed yet, so we can still change them. I think > `controller` is probably more descriptive, so if you're in agreement, we > can amend KIP-595 so that it uses the `controller` prefix. However, I think > it's important to redefine `controller.connect` so that it is an explicit > definition of the set of metadata voters. Right now it reads more like a > bootstrap url. Since we decided to take dynamic quorum configuration out of > the initial version, we need the voter set to be defined explicitly in > static configuration. We used `quorum.voters` in KIP-595. > Yeah, naming is hard. I think ideally we would name those configs in such a way that we leave the door open to having separate configs for non-metadata Raft topics. I like "metadata.quorum." as a prefix for the KIP-595 configs. I'm curious what you folks think... best, Colin > > Best, > Jason > > On Tue, Sep 29, 2020 at 5:21 PM Jose Garcia Sancio <jsan...@confluent.io> > wrote: > > > Hi Jun and Colin, > > > > Some comments below. > > > > > 62.3 We added some configs in KIP-595 prefixed with "quorum" and we plan > > to > > > add some controller specific configs prefixed with "controller". KIP-630 > > > plans to add some other controller specific configs with no prefix. > > Should > > > we standardize all controller specific configs with the same prefix? > > > > I agree that consistency in all of the new properties is really > > important to improve the user's experience with Kafka. After some > > discussion in the review of KIP-630, the configuration names start > > with "metadata". > > > > metadata.snapshot.min.cleanable.ratio > > metadata.lbo.lag.time.max.ms > > > > The reason for the change is that this configuration affects both the > > controller component and the metadata cache component as users of the > > "cluster metadata" topic partition. I think the new names matches > > Kafka's existing configuration pattern for the transaction and > > consumer offset topic partitions. > > > > Thanks! > > -- > > -Jose > > >