showuon commented on PR #20859: URL: https://github.com/apache/kafka/pull/20859#issuecomment-3587572361
> Thanks for the changes @TaiJuWu. Leaving a review of the current approach in `src/main`, but please consider the below as an alternative approach (I think the below is simpler, but I would like @jsancio and @showuon 's input on it). In the interest of not duplicating effort, I think we should all agree on a design first. > > Sorry for the back and forth, but I think another, potentially easier way to go about this is actually to make the `ADD_RAFT_VOTER` sent by auto-join idempotent for the lifetime of the controller process. Basically, the leader will look through its voter set history and return a successful response if the requested node + incarnation ID was ever part of its voter set history. > > This approach requires a KIP, because we need to send the incarnation ID as part of the request, and the VotersRecord schema needs to be updated (meaning we need a kraft.version=2 as well I think), but it is much easier to prove this approach's correctness. > > You can think of our current approach as trying to handle every single edge case on the client-side, which is difficult to reason about because of partitioning + async nature of replication, whereas this approach makes our state change operation idempotent server-side, so the leader does not care if the client (a given controller process incarnation + `ReplicaKey` tuple) sends additional add voter requests. @kevin-wu24 , I'm +1 on having a KIP to have the gate in the leader controller side. After all, the controllers addition/removal are all handling in the leader controller side. I agree that will be much reliable and clean way to handle it. But for v4.2, I think we still need to have a temporary solution for it, which should be this PR. My imagination is after the improved KIP merged, we can revert the solutions in this PR. And in users point of view, there is no any change when using v4.2 and v4.3. So my suggestion is: 1. in v4.2, we adopt the temporary solution in this PR 2. in v4.3 or later, we adopt the new way which gating in the leader controller side, which requires a KIP, and then revert this PR after KIP implemented. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
