On Tue, Jul 25, 2023, at 05:30, Luke Chen wrote: > Hi Colin, > > Some more comments: > 1. In the KIP, we mentioned "controller heartbeats", but it is not > explained anywhere. > I think "controller heartbeats" = controller registration", is that > correct? > If no, please explain more about it in the KIP.
Hi Luke, Sorry, this was an artifact of earlier revisions. I have changed it to "ControllerRegistration" in all the cases where I didn't update it before. > > 2. Following this question: > > Which endpoint will the inactive controllers use to send the > > ControllerRegistrationRequest? > > A: They will use the endpoint in controller.quorum.voters. > If the registration request will include controller.quorum.voters, why > bother sending this information to active controller again? > The active controller should already have all the controller.quorum.voters > when start up. > Any purpose of that design? For validation? The controllers don't know which endpoint controller.quorum.voters is referencing. > > 3. If a controller node is not part of `controller.quorum.voters`, when it > sends ControllerRegistrationRequest, what will we respond to it? > Good point. I added a new error, UNKNOWN_CONTROLLER_ID, for this case. > 4. Nice and clear compatibility matrix! > Thanks! Colin > Thank you. > Luke > > On Sat, Jul 22, 2023 at 3:38 AM Colin McCabe <cmcc...@apache.org> wrote: > >> On Fri, Jul 21, 2023, at 09:43, José Armando García Sancio wrote: >> > Thanks for the KIP Colin. Apologies if some of these points have >> > already been made. I have not followed the discussion closely: >> > >> > 1. Re: Periodically, each controller will check that the controller >> > registration for its ID is as expected >> > >> > Does this need to be periodic? Can't the controller schedule this RPC, >> > retry etc, when it finds that the incarnation ID doesn't match its >> > own? >> > >> >> Hi José, >> >> Thanks for the reviews. >> >> David had the same question. I agree that it should be event-driven rather >> than periodic (except for retries, etc.) >> >> > >> > 2. Did you consider including the active controller's epoch in the >> > ControllerRegistrationRequest? >> > >> > This would allow the active controller to reject registration from >> > controllers that are not part of the active quorum and don't know the >> > latest controller epoch. This should mitigate some of the concerns you >> > raised in bullet point 1. >> > >> >> Good idea. I will add the active controller epoch to the registration >> request. >> >> > >> > 3. Which endpoint will the inactive controllers use to send the >> > ControllerRegistrationRequest? >> > >> > Will it use the first endpoint described in the cluster metadata >> > controller registration record? Or would it use the endpoint described >> > in the server configuration at controller.quorum.voters? >> > >> >> They will use the endpoint in controller.quorum.voters. In general, the >> endpoints from the registration are only used for responding to >> DESCRIBE_CLUSTER. Since, after all, we may not even have the registration >> endpoints when we start up. >> >> > >> > 4. Re: Raft integration in the rejected alternatives >> > >> > Yes, The KRaft layer needs to solve a similar problem like endpoint >> > discovery to support dynamic controller membership change. As you >> > point out the requirements are different and the set of information >> > that needs to be tracked is different. I think it is okay to use a >> > different solution for each of these problems. >> >> Yeah that was my feeling too. Thanks for taking a look. >> >> regards, >> Colin >> >> > >> > Thanks! >> > -- >> > -José >>