On Tue, Jul 7, 2020, at 18:27, Ron Dagostino wrote: > HI Colin. Thanks for the KIP. Here is some feedback and various questions. >
Hi Ron, Thanks for taking a look! > "*Controller processes will listen on a separate port from brokers. This > will be true even when the broker and controller are co-located in the same > JVM*". I assume it is possible that the port numbers could be the same when > using separate JVMs (i.e. broker uses port 9192 and controller also uses > port 9192). I think it would be clearer to state this along these > lines: "Controller > nodes will listen on a port, and the controller port must differ from any > port that a broker in the same JVM is listening on. In other words, a > controller and a broker node, when in the same JVM, do not share ports" > I changed it to "Controller processes will listen on a separate endpoint from brokers" to avoid confusion. > > I think the sentence "*In the realm of ACLs, this translates to controllers > requiring CLUSTERACTION on CLUSTER for all operations*" is confusing. It > feels to me that you can just delete it. Am I missing something here? > Hmm. I think it is important to be specific about what ACLs we plan to require. CLUSTERACTION on CLUSTER is what we generally use for broker-to-broker operations. I'm not sure how to improve this. Maybe a link to some background material would help? > > The KIP states "*The metadata will be stored in memory on all the active > controllers.*" Can there be multiple active controllers? Should it > instead read "The metadata will be stored in memory on all potential > controllers." (or something like that)? > Oh, this should have just been "The metadata will be stored in memory on all the controllers." Fixed. > > KIP-595 states "*we have assumed the name __cluster_metadata for this > topic, but this is not a formal part of this proposal*". This KIP-631 > states "*Metadata changes need to be persisted to the __metadata log before > we propagate them to the other nodes in the cluster. This means waiting > for the metadata log's last stable offset to advance to the offset of the > change.*" Are we here formally defining "__metadata" as the topic name, > and should these sentences refer to "__metadata topic" rather than > "__metadata log"? I think the place to have the discussion about the name of the metadata topic is here. The other KIPs only refer to this tangentially. We've had a few suggestions in the past, but the current one in the doc is __kafka_metadata. Thinking about it a bit more, though, maybe we should use something that isn't valid as a normal topic name, to avoid conflicts. Perhaps ~metadata would be good. > What are the "other nodes in the cluster" that are > referred to? These are not controller nodes but brokers, right? It's intended to be all nodes: controller and brokers. After all, the other controllers also need to mirror the metadata. > If so, > then should we say "before we propagate them to the brokers"? Technically > we have a controller cluster and a broker cluster -- two separate clusters, > correct? (Even though we could potentially share JVMs and therefore > require no additional processes.). If the statement is referring to nodes > in both clusters then maybe we should state "before we propagate them to > the other nodes in the controller cluster or to brokers." > > "*The controller may have several of these uncommitted changes in flight at > any given time. In essence, the controller's in-memory state is always a > little bit in the future compared to the current state. This allows the > controller to continue doing things while it waits for the previous changes > to be committed to the Raft log.*" Should the three references above be to > the active controller rather than just the controller? > Good point. I fixed this to refer to the "active controller." > > "*Therefore, the controller must not make this future state "visible" to > the rest of the cluster until it has been made persistent – that is, until > it becomes current state*". Again I wonder if this should refer to "active" > controller, and indicate "anyone else" as opposed to "the rest of the > cluster" since we are talking about 2 clusters here? > Fixed. > > "*When the active controller decides that it itself should create a > snapshot, it will first try to give up the leadership of the Raft quorum.*" > Why? Is it necessary to state this? It seems like it might be an > implementation detail rather than a necessary constraint/requirement that > we declare publicly and would have to abide by. > It seems in scope here, since it describes how the controller interacts with the snapshotting mechanism. An implementation detail would be something more like a class name or code, I think. > > "*It will reject brokers whose metadata is too stale*". Why? An example > might be helpful here. > There's a bit more context in KIP-500. Stale metadata is a general problem since it can confuse clients-- they don't see a topic which actually does exist, for example, or do see one that was deleted. > > "*it may lose subsequent conflicts if its broker epoch is stale*" This is > the first time a "broker epoch" is mentioned. I am assuming it is the > controller epoch communicated to it (if any). It would be good to > introduce it/explicitly state what it is before referring to it. > Broker epoch was first described in KIP-380. I added a link to that KIP for some background. ( https://cwiki.apache.org/confluence/x/-oGzBQ ) best, Colin > > Ron > > On Tue, Jul 7, 2020 at 6:48 PM Colin McCabe <cmcc...@apache.org> wrote: > > > Hi all, > > > > I posted a KIP about how the quorum-based controller envisioned in KIP-500 > > will work. Please take a look here: > > https://cwiki.apache.org/confluence/x/4RV4CQ > > > > best, > > Colin > > >