Hi José,

Thanks for the KIP! I have not had time to fully digest it, but I had some 
initial questions:

1. It seems like the proposal is to have a UUID per partition directory on the 
voter. If I understand correctly, this is sometimes referred to as "VoterUUID" 
and sometimes as "ReplicaUUID." The latter seems more accurate, since a single 
voter could have multiple of these IDs, in a situation where we had multiple 
Raft topics. So it would be good to standardize on that. Also, I didn't see a 
description of how this would be stored in the log directory. That would be 
good to add.

2. When we originally did the Raft and Quorum Controller KIPs, one contentious 
topic was node IDs. We eventually settled on the idea that broker and 
controller IDs were in the same ID space. So you can't (for example) have a 
broker 3 that is in a separate JVM from controller 3. This is pretty easy to 
enforce with a static configuration, but it seems like it will be harder to do 
dynamically.

I would like to keep this invariant. This probably requires us to reject 
attempts to add a new quorum voter which duplicates a broker ID (except in the 
special case of co-location!) Similarly, we should reject broker registrations 
that duplicate an unrelated controller ID. The broker's incarnation ID is the 
key to doing this, I think. But that requires us to send the incarnation ID in 
many of these RPCs.

3. Is it really necessary to put the endpoint information into the 
AddVoterRecord? It seems like that could be figured out at runtime, like we do 
today. If we do need it, it seems particularly weird for it to be per-partition 
(will we have a separate TCP port for each Raft partition?) I also don't know 
why we'd want multiple endpoints. We have that for the broker because the 
endpoints have different uses, but that isn't the case here.

The original rationale for multiple endpoints on the controllers was to support 
migration from PLAINTEXT to SSL (or whatever). But that only requires multiple 
listeners to be active on the receive side, not send side. A single voter never 
needs more than one endpoint to contact a peer.

Overall, I think we'd be better off keeping this as soft state rather than 
adding it to the log. Particularly if it's not in the log at all for the static 
configuration case...

4. How do you get from the static configuration situation to the dynamic one? 
Can it be done with a rolling restart? I think the answer is yes, but I wasn't 
quite sure on reading. Does a leader using the static configuration auto-remove 
voters that aren't in that static config, as well as auto-add? The adding 
behavior is spelled out, but not removing (or maybe I missed it).

best,
Colin


On Thu, Jul 21, 2022, at 09:49, José Armando García Sancio wrote:
> Hi all,
>
> I would like to start the discussion on my design to support
> dynamically changing the set of voters in the KRaft cluster metadata
> topic partition.
>
> KIP URL: https://cwiki.apache.org/confluence/x/nyH1D
>
> Thanks!
> -José

Reply via email to