showuon commented on PR #20859:
URL: https://github.com/apache/kafka/pull/20859#issuecomment-3587572361

   > Thanks for the changes @TaiJuWu. Leaving a review of the current approach 
in `src/main`, but please consider the below as an alternative approach (I 
think the below is simpler, but I would like @jsancio and @showuon 's input on 
it). In the interest of not duplicating effort, I think we should all agree on 
a design first.
   > 
   > Sorry for the back and forth, but I think another, potentially easier way 
to go about this is actually to make the `ADD_RAFT_VOTER` sent by auto-join 
idempotent for the lifetime of the controller process. Basically, the leader 
will look through its voter set history and return a successful response if the 
requested node + incarnation ID was ever part of its voter set history.
   > 
   > This approach requires a KIP, because we need to send the incarnation ID 
as part of the request, and the VotersRecord schema needs to be updated 
(meaning we need a kraft.version=2 as well I think), but it is much easier to 
prove this approach's correctness.
   > 
   > You can think of our current approach as trying to handle every single 
edge case on the client-side, which is difficult to reason about because of 
partitioning + async nature of replication, whereas this approach makes our 
state change operation idempotent server-side, so the leader does not care if 
the client (a given controller process incarnation + `ReplicaKey` tuple) sends 
additional add voter requests.
   
   @kevin-wu24 , I'm +1 on having a KIP to have the gate in the leader 
controller side. After all, the controllers addition/removal are all handling 
in the leader controller side. I agree that will be much reliable and clean way 
to handle it. 
   
   But for v4.2, I think we still need to have a temporary solution for it, 
which should be this PR. My imagination is after the improved KIP merged, we 
can revert the solutions in this PR. And in users point of view, there is no 
any change when using v4.2 and v4.3. 
   
   So my suggestion is:
   1. in v4.2, we adopt the temporary solution in this PR
   2. in v4.3 or later, we adopt the new way which gating in the leader 
controller side, which requires a KIP, and then revert this PR after KIP 
implemented.
   
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to