Thanks for the KIP Colin. Apologies if some of these points have already been made. I have not followed the discussion closely:
1. Re: Periodically, each controller will check that the controller registration for its ID is as expected Does this need to be periodic? Can't the controller schedule this RPC, retry etc, when it finds that the incarnation ID doesn't match its own? 2. Did you consider including the active controller's epoch in the ControllerRegistrationRequest? This would allow the active controller to reject registration from controllers that are not part of the active quorum and don't know the latest controller epoch. This should mitigate some of the concerns you raised in bullet point 1. 3. Which endpoint will the inactive controllers use to send the ControllerRegistrationRequest? Will it use the first endpoint described in the cluster metadata controller registration record? Or would it use the endpoint described in the server configuration at controller.quorum.voters? 4. Re: Raft integration in the rejected alternatives Yes, The KRaft layer needs to solve a similar problem like endpoint discovery to support dynamic controller membership change. As you point out the requirements are different and the set of information that needs to be tracked is different. I think it is okay to use a different solution for each of these problems. Thanks! -- -José