Hi José, Thanks for the KIP! I have not had time to fully digest it, but I had some initial questions:
1. It seems like the proposal is to have a UUID per partition directory on the voter. If I understand correctly, this is sometimes referred to as "VoterUUID" and sometimes as "ReplicaUUID." The latter seems more accurate, since a single voter could have multiple of these IDs, in a situation where we had multiple Raft topics. So it would be good to standardize on that. Also, I didn't see a description of how this would be stored in the log directory. That would be good to add. 2. When we originally did the Raft and Quorum Controller KIPs, one contentious topic was node IDs. We eventually settled on the idea that broker and controller IDs were in the same ID space. So you can't (for example) have a broker 3 that is in a separate JVM from controller 3. This is pretty easy to enforce with a static configuration, but it seems like it will be harder to do dynamically. I would like to keep this invariant. This probably requires us to reject attempts to add a new quorum voter which duplicates a broker ID (except in the special case of co-location!) Similarly, we should reject broker registrations that duplicate an unrelated controller ID. The broker's incarnation ID is the key to doing this, I think. But that requires us to send the incarnation ID in many of these RPCs. 3. Is it really necessary to put the endpoint information into the AddVoterRecord? It seems like that could be figured out at runtime, like we do today. If we do need it, it seems particularly weird for it to be per-partition (will we have a separate TCP port for each Raft partition?) I also don't know why we'd want multiple endpoints. We have that for the broker because the endpoints have different uses, but that isn't the case here. The original rationale for multiple endpoints on the controllers was to support migration from PLAINTEXT to SSL (or whatever). But that only requires multiple listeners to be active on the receive side, not send side. A single voter never needs more than one endpoint to contact a peer. Overall, I think we'd be better off keeping this as soft state rather than adding it to the log. Particularly if it's not in the log at all for the static configuration case... 4. How do you get from the static configuration situation to the dynamic one? Can it be done with a rolling restart? I think the answer is yes, but I wasn't quite sure on reading. Does a leader using the static configuration auto-remove voters that aren't in that static config, as well as auto-add? The adding behavior is spelled out, but not removing (or maybe I missed it). best, Colin On Thu, Jul 21, 2022, at 09:49, José Armando García Sancio wrote: > Hi all, > > I would like to start the discussion on my design to support > dynamically changing the set of voters in the KRaft cluster metadata > topic partition. > > KIP URL: https://cwiki.apache.org/confluence/x/nyH1D > > Thanks! > -José