Hi Jun, Thanks for the reply.
RE JR4: To me, the main motivation for having an explicit `--unregister` flag is that `remove-controller` and `unregister-controller` assume two different things about the supplied node. For removing a node from the KRaft voter set, no assumption is made about whether the node is running anymore -- Kafka supports either case. However, the act of unregistering a controller requires assuming that the node will "not be around soon." This is because subsequent feature upgrades will no longer consider the supported levels of an unregistered controller. An operator may decide to keep a node around as an observer, possibly with the intention to make it a voter in the future. Making the unregistration always occur alongside voter removal would make the observer controller in the example above unregister and then re-register because the node is still around. This allows for the feature upgrade race I mentioned previously (i.e. controller unregisters, operator upgrades a feature that should not be supported, controller re-registers). Therefore, I think we should have an explicit `--unregister` flag for `remove-controller` since the assumptions around the state of the cluster change compared to the base command. What do you think? RE JR5: Yeah, I believe so. Thanks for catching this case. One could specify controller.quorum.bootstrap.servers instead of controller.quorum.voters on a controller in a static quorum. This would be a valid static config that passes the check in `KafkaConfig#validateControllerQuorumVotersMustContainNodeIdForKRaftController`. I have updated the KIP with these changes. RE JR6: Yes, it should say "every Kafka node." I have updated the KIP to fix this. Best, Kevin Wu On Thu, Apr 30, 2026 at 6:12 PM Jun Rao via dev <[email protected]> wrote: > Hi, Kevin, > > Thanks for the reply. > > JR4. Is there a use case for `kafka-metadata-quorum remove-controller` > without the `--unregister` flag? If not, could we remove the --unregister > flag? > > JR5. For the second user experience "Unregister an observer controller in a > dynamic quorum", one can have and remove an observer controller in the > static quorum too, right? > > JR6. "Ensure the stopped voter is not part of controller.quorum.voters on > any other Kafka nodes" > "any other Kafka nodes" should be "every Kafka node", right? > > Jun > > On Mon, Apr 27, 2026 at 1:33 PM Kevin Wu <[email protected]> wrote: > > > Hi Jun, > > > > Thanks for the feedback. > > > > I have updated the KIP to make a separate section detailing the user > > experience. > > > > Best, > > Kevin Wu > > > > On Mon, Apr 27, 2026 at 12:05 PM Jun Rao via dev <[email protected]> > > wrote: > > > > > Hi, Kevin, > > > > > > Thanks for the reply. > > > > > > It would be useful to have a separate user experience section that > > > documents the steps for common scenarios involving the tools. > > > > > > The scenarios are: > > > 1. Remove a voter in dynamic KRaft quorum > > > stop the voter > > > run kafka-metadata-quorum remove-controller with --unregister > > > 2. Unregister an observer controller > > > stop the observer > > > run kafka-cluster unregister-controller > > > 3. Unregister a voter in a static KRaft quorum when the static voter > set > > is > > > mistakenly configured. > > > stop the voter > > > run kafka-cluster unregister-controller > > > remove voter from controller.quorum.voters ? > > > > > > Jun > > > > > > On Fri, Apr 24, 2026 at 11:49 AM Kevin Wu <[email protected]> > > wrote: > > > > > > > Hi Jun, > > > > > > > > Thanks for the discussion. > > > > Yeah, those are the scenarios for using these tools. I have > documented > > > > their usage in the KIP. > > > > > > > > Best, > > > > Kevin Wu > > > > > > > > On Thu, Apr 23, 2026 at 11:51 AM Jun Rao via dev < > [email protected] > > > > > > > wrote: > > > > > > > > > Hi, Kevin, > > > > > > > > > > Thanks for the reply. > > > > > > > > > > Your suggestion sounds good to me. It would be useful to document > the > > > > usage > > > > > of those tools. The scenarios are: > > > > > 1. Remove a voter in dynamic KRaft quorum > > > > > 2. Unregister an observer controller > > > > > 3. Unregister a voter in a static KRaft quorum when the static > voter > > > set > > > > is > > > > > mistakenly configured. > > > > > > > > > > For item 3, could you document how it works? Does one need to stop > > the > > > > > misconfigured voter first and then unregister it? > > > > > > > > > > Are there other scenarios? > > > > > > > > > > Jun > > > > > > > > > > On Thu, Apr 23, 2026 at 8:22 AM Kevin Wu <[email protected]> > > > wrote: > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > Thanks for the replies. > > > > > > > > > > > > RE JR3: I would like the design of this feature to not introduce > > more > > > > > > coupling of the KRaft and metadata layers. Observer controllers > are > > > > > > supported, but they are a KRaft concept, so it should not be > known > > to > > > > the > > > > > > metadata layer whether or not a given controller is a voter or > > > > observer. > > > > > > > > > > > > What do you think about the following documentation and execution > > > > pattern > > > > > > regarding these CLI commands? > > > > > > > > > > > > `kafka-cluster unregister-controller` is a command for users when > > > they > > > > > want > > > > > > to unregister a controller from the cluster. We can document that > > > this > > > > is > > > > > > potentially unsafe and should only be done if the operator does > not > > > > > intend > > > > > > to bring back up that controller. `kafka-cluster > > > unregister-controller` > > > > > > works irrespective of the quorum mode. > > > > > > > > > > > > Going forward, running `kafka-metadata-quorum remove-controller` > > > > removes > > > > > a > > > > > > controller as a KRaft voter, and continues to only be supported > in > > a > > > > > > dynamic quorum cluster. I still think the unregistering behavior > > > should > > > > > be > > > > > > an additional flag, because having an observer controller that is > > > still > > > > > > registered to the cluster is a valid configuration in Kafka. I > > think > > > of > > > > > > `kafka-metadata-quorum remove-controller --unregister` as a > > > "built-in" > > > > > CLI > > > > > > script, since removing a voter and unregistering it from the > > cluster > > > is > > > > > > probably a very common usage pattern. This command will only send > > > > > > UnregisterController RPC if the cluster supports dynamic quorum, > so > > > the > > > > > > overall command behavior is consistent with how it is today with > > > > respect > > > > > to > > > > > > the kraft.version level of the cluster. If the cluster does not > > > support > > > > > > dynamic quorum, the CLI can direct the user to instead run the > > > > > > `kafka-cluster unregister-controller` command. > > > > > > > > > > > > Best, > > > > > > Kevin Wu > > > > > > > > > > > > On Tue, Apr 21, 2026 at 5:39 PM Jun Rao via dev < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi, Kevin, > > > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > > > JR2. Good point on auto-join. I think we can introduce the > > > > > > > new UnregisterControllerRequest and keep the auto-join behavior > > as > > > is > > > > > > > (i.e., without unregistering the controller when removing the > old > > > > > > instance > > > > > > > from the voter). The command "kafka-metadata-quorum > > > > remove-controller" > > > > > > will > > > > > > > send two separate RPC requests, RemoveRaftVoterRequest and > > > > > > > UnregisterControllerRequest as documented in the KIP. > > > > > > > > > > > > > > JR3. When will a user use the command "kafka-cluster > > > > > > > unregister-controller"? Is this only for unregistering an > > observer > > > > > > > controller? If the observer controller is currently supported, > we > > > can > > > > > add > > > > > > > that command. It would be useful to document the usage for both > > > > > commands. > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 21, 2026 at 9:25 AM Kevin Wu < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > > > > > RE JR1: Yeah, I will update KIP to touch on this static > quorum > > > edge > > > > > > case. > > > > > > > > > > > > > > > > RE JR2: That seems reasonable to me, since we would avoid two > > RPC > > > > > hops > > > > > > > (one > > > > > > > > for RemoveVoter, one for UnregisterController). One thing to > > note > > > > is > > > > > > that > > > > > > > > with KIP-1186 > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1186*3A*Update*AddRaftVoterRequest*RPC*to*support*auto-join__;JSsrKysrKw!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtGeJkFHCg$ > > > > > > > > >, > > > > > > > > besides operators manually removing controllers, observer > > > > controllers > > > > > > > > themselves can send `RemoveRaftVoter` to remove their old > > > > > incarnations > > > > > > > from > > > > > > > > the voter set as part of the auto-join feature. With > auto-join > > > and > > > > > this > > > > > > > > proposed behavior, explicitly removing a controller's old > > > > > registration > > > > > > > > alongside its old voter set entry can lead to "unsupported" > > > > upgrades > > > > > in > > > > > > > the > > > > > > > > cluster. An operator doing these steps manually can be argued > > as > > > > > > > > misconfiguring the cluster, but the auto-join feature > allowing > > > for > > > > > this > > > > > > > > scenario seems like a bug. > > > > > > > > > > > > > > > > Consider the below example with auto-join enabled: 3 > > controllers > > > in > > > > > the > > > > > > > > voter set (A,B,C) where A supports feature levels X=[0-1], B > > > > supports > > > > > > > > feature levels X=[0-1], but C only supports X=0. Currently, > > node > > > A > > > > is > > > > > > the > > > > > > > > active controller, all 3 controllers are registered, but > > > upgrading > > > > > > > feature > > > > > > > > X to feature level 1 is not supported because C does not > > support > > > > it. > > > > > > > > Controller C restarts with a new disk (now represented as > C'). > > > The > > > > > > > > auto-join code runs to first remove C from the voter set, and > > > then > > > > > > remove > > > > > > > > the registration for C. These records are committed via > nodes A > > > and > > > > > B. > > > > > > > Now, > > > > > > > > from the active controller's perspective, the cluster does > > > support > > > > > > > > upgrading feature X to level 1. There is a race between C' > > adding > > > > > > itself > > > > > > > > back to the KRaft voter set and re-registering itself, and a > > > > > potential > > > > > > > > feature level upgrade. Another interesting thing to note > after > > > > > looking > > > > > > at > > > > > > > > the code is that controllers can register even if they do not > > > > support > > > > > > the > > > > > > > > finalized features of the cluster, which is different from > > broker > > > > > > > > registration. In Kafka's current code, the original > > registration > > > > for > > > > > C > > > > > > > > stays in the log after C is removed as a voter by auto-join, > > > which > > > > > > > prevents > > > > > > > > an upgrade of feature X. At some point, the registration for > C > > is > > > > > > updated > > > > > > > > by C' because C' is a different process incarnation, but a > > > > > registration > > > > > > > > that blocks X's upgrade is always in the log. > > > > > > > > > > > > > > > > Therefore, Kafka should not unregister a controller when > > > auto-join > > > > > > > removes > > > > > > > > a controller from the voter set. This means including a new > RPC > > > > > version > > > > > > > for > > > > > > > > `RemoveRaftVoter` that introduces a boolean field telling the > > > > active > > > > > > > > controller whether to also unregister the controller. This > > field > > > > > would > > > > > > be > > > > > > > > completely ignored by the raft layer, and instead would be > > > handled > > > > at > > > > > > the > > > > > > > > ControllerApis level. I think it is fine to unregister a > > > controller > > > > > > > > whenever the operator runs `kafka-metadata-quorum > > > > remove-controller` > > > > > > for > > > > > > > a > > > > > > > > smooth UX with dynamic quorum. What do you think? > > > > > > > > > > > > > > > > RE JR3: Maybe we can document this better as part of the code > > > > changes > > > > > > to > > > > > > > > this KIP, but in my opinion, the kafka-cluster tool deals > with > > > > > cluster > > > > > > > > membership (brokers and controllers), which is a metadata > layer > > > > > > concept. > > > > > > > If > > > > > > > > you look at the `list-endpoints` command, you can list out > the > > > > > > registered > > > > > > > > controller endpoints. Alternatively, the > kafka-metadata-quorum > > > tool > > > > > > deals > > > > > > > > with KRaft, which knows about concepts like leader, voter, > and > > > > > > observers. > > > > > > > > The `add-controller` and `remove-controller` sub-commands > > > > > inadvertently > > > > > > > > deal with controllers (since controllers can be voters), but > > the > > > > > > > `describe` > > > > > > > > sub-command tree also shows information about brokers, which > > are > > > > > > > observers > > > > > > > > to KRaft. My decision to include the `unregister-controller` > > > > command > > > > > in > > > > > > > the > > > > > > > > `kafka-cluster` tool is mainly motivated by this distinction. > > > > > > > Additionally, > > > > > > > > if we only send `RemoveVoterRequest` in `remove-controller`, > it > > > > seems > > > > > > > hacky > > > > > > > > to direct users to use that command for unregistering any > > > > controller, > > > > > > > since > > > > > > > > for observers, the remove voter logic of that request will > > always > > > > > fail > > > > > > in > > > > > > > > the raft layer. What do you think? > > > > > > > > > > > > > > > > Best, > > > > > > > > Kevin Wu > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 21, 2026 at 8:17 AM Paolo Patierno < > > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > thanks for the KIP. > > > > > > > > > From reading it, it's not clear because not explicit, but I > > > would > > > > > > > assume > > > > > > > > > you are going to expose a new unregisterController method > > > through > > > > > the > > > > > > > > > AdminClient API as well, is my assumption right? > > > > > > > > > I expect it would be used underneath by the tools you are > > going > > > > to > > > > > > > > modify. > > > > > > > > > Having such support within the AdminClient API is important > > > when > > > > > the > > > > > > > > > operator is not a human to run the tool but a Kubernetes > > > operator > > > > > > (i.e. > > > > > > > > > Strimzi) with the need to unregister a controller. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Paolo. > > > > > > > > > > > > > > > > > > On Mon, 20 Apr 2026 at 21:57, Kevin Wu < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > > > > > > > > > RE JR1: I would say the main use case is dynamic quorums, > > > since > > > > > the > > > > > > > > > concept > > > > > > > > > > of the observer controller becomes a thing in that world. > > > > > However, > > > > > > > > there > > > > > > > > > is > > > > > > > > > > a static quorum edge case if the operator misconfigures > > > > > > > > > > `controller.quorum.voters`. If a new controller voter > > > > mistakenly > > > > > > > joins > > > > > > > > > the > > > > > > > > > > cluster, it will also persist a registration record. In > my > > > > > opinion, > > > > > > > > there > > > > > > > > > > should be a way to remove a controller registration via > > > > > AdminClient > > > > > > > CLI > > > > > > > > > in > > > > > > > > > > all quorum modes. > > > > > > > > > > > > > > > > > > > > RE JR2: Yes, the existing command only removes the voter, > > but > > > > > does > > > > > > > not > > > > > > > > > > unregister the controller. I left it as a separate flag > for > > > now > > > > > > > because > > > > > > > > > > they are "separate" operations in that being a raft voter > > is > > > a > > > > > > subset > > > > > > > > of > > > > > > > > > > being a controller in dynamic quorums, but I am not > opposed > > > to > > > > > > making > > > > > > > > > this > > > > > > > > > > command try to do both (remove voter and unregister the > > > > > controller) > > > > > > > by > > > > > > > > > > default. In my opinion, an observer controller is > "useless" > > > in > > > > > that > > > > > > > it > > > > > > > > > does > > > > > > > > > > not participate in the leader election or replication > parts > > > of > > > > > the > > > > > > > > KRaft > > > > > > > > > > protocol, so I see no issue with doing both operations > > > always. > > > > > > > However, > > > > > > > > > an > > > > > > > > > > operator may want observer controllers around for other > > > reasons > > > > > > like > > > > > > > > > > redundancy. Do you (or others) have any insight into how > > > users > > > > > may > > > > > > be > > > > > > > > > > configuring clusters with observer controllers? If not, I > > > think > > > > > it > > > > > > is > > > > > > > > > okay > > > > > > > > > > to remove the flag and make it the default behavior of > > > > > > > > > > `kafka-metadata-quorum remove-controller`. > > > > > > > > > > > > > > > > > > > > RE JR3: Not exactly. The `kafka-metadata-quorum > > > > remove-controller > > > > > > ... > > > > > > > > > > --unregister` sends 2 RPCs to the active controller, one > to > > > > > remove > > > > > > a > > > > > > > > node > > > > > > > > > > from the voter set, and another to unregister the node. > The > > > > > > > > > `kafka-cluster > > > > > > > > > > unregister-controller` command just sends 1 RPC to the > > active > > > > > > > > controller > > > > > > > > > to > > > > > > > > > > unregister the node. My motivation for having two > separate > > > > > commands > > > > > > > is > > > > > > > > > > because `remove-controller` is associated with dynamic > > > quorum, > > > > > > since > > > > > > > > the > > > > > > > > > > `RemoveRaftVoterRPC` will fail if the kraft.version=0. > What > > > do > > > > > you > > > > > > > > think? > > > > > > > > > > > > > > > > > > > > RE JR4: I have updated the sections for the CLI commands > in > > > the > > > > > KIP > > > > > > > to > > > > > > > > > add > > > > > > > > > > this information. > > > > > > > > > > > > > > > > > > > > RE JR5: This is describing the current implementation of > > the > > > > > > > > > > ControllerRegistrationManager, which will listen to the > > > > metadata > > > > > > log > > > > > > > > and > > > > > > > > > > send ControllerRegistrationRequest when the local node id > > is > > > > not > > > > > > > > > registered > > > > > > > > > > in the log. It looks like this is slightly different from > > how > > > > we > > > > > > > handle > > > > > > > > > > broker registration in BrokerLifecycleManager. Currently, > > > this > > > > > code > > > > > > > > path > > > > > > > > > > never executes because controller registrations cannot be > > > > > removed. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Kevin Wu > > > > > > > > > > > > > > > > > > > > On Fri, Apr 17, 2026 at 2:08 PM Jun Rao via dev < > > > > > > > [email protected]> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi, Kevin, > > > > > > > > > > > > > > > > > > > > > > Thanks for the KIP. A few comments. > > > > > > > > > > > > > > > > > > > > > > JR1. I guess this is only intended for dynamic KRaft > > > quorums? > > > > > If > > > > > > > so, > > > > > > > > it > > > > > > > > > > > would be useful to clarify that. > > > > > > > > > > > > > > > > > > > > > > JR2. kafka-metadata-quorum remove-controller > > > --controller-id > > > > > 9990 > > > > > > > > > > > --controller-directory-id EXAMPLE_UUID --unregister > > > > > > > > > > > So, the existing remove-controller logic only changes > the > > > > voter > > > > > > > set, > > > > > > > > > but > > > > > > > > > > > doesn't unregister the controller? Should we just > always > > do > > > > > these > > > > > > > two > > > > > > > > > > > together? Is there a use case for only removing a > > > controller > > > > > from > > > > > > > the > > > > > > > > > > voter > > > > > > > > > > > set, but not unregsitering? > > > > > > > > > > > > > > > > > > > > > > JR3. Is kafka-cluster unregister-controller equivalent > to > > > > > > > > > > > kafka-metadata-quorum remove-controller --controller-id > > > 9990 > > > > > > > > > > > --controller-directory-id EXAMPLE_UUID --unregister? > > > > > > > > > > > > > > > > > > > > > > JR4. Could you describe the underlying workflow for > each > > > new > > > > > > > command > > > > > > > > > > (RPCs > > > > > > > > > > > sent, metadata records generated, actions taken by the > > > > > > controller, > > > > > > > > > etc)? > > > > > > > > > > > > > > > > > > > > > > JR5. "The registration manager of an unregistered > > > controller > > > > > > > already > > > > > > > > > > > attempts to re-register with the active controller. > This > > is > > > > to > > > > > > > > prevent > > > > > > > > > > > accidental unregistrations." > > > > > > > > > > > I don't quite understand this. Why will an unregistered > > > > > > controller > > > > > > > > > > attempt > > > > > > > > > > > to re-register? > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 3, 2026 at 11:31 AM Kevin Wu < > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > I would like to start a discussion on KIP-1312: > Support > > > > > > > > unregistering > > > > > > > > > > > > controllers. Below is the KIP link. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1312*3A*Support*unregistering*controllers__;JSsrKw!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtFeUg-7gg$ > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Kevin Wu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Paolo Patierno > > > > > > > > > > > > > > > > > > *Senior Principal Software Engineer @ IBM**CNCF Ambassador* > > > > > > > > > > > > > > > > > > Twitter : @ppatierno < > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__http://twitter.com/ppatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtHGG-mS-Q$ > > > > > > > > > > > > > > > > Linkedin : paolopatierno < > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__http://it.linkedin.com/in/paolopatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtFcWWCD5g$ > > > > > > > > > > > > > > > > GitHub : ppatierno < > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://github.com/ppatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtEK-wncPw$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
