kevin-wu24 commented on PR #20859:
URL: https://github.com/apache/kafka/pull/20859#issuecomment-3584172984

   > we can resolve this issue because the state will move to HAS_JOINED when 
finding the voter contains itself. But we can handle the issue separately since 
it's not directly related to this issue. Opened 
[KAFKA-19933](https://issues.apache.org/jira/browse/KAFKA-19933).
   
   The follower can only move to HAS_JOINED via processing a FETCH_RESPONSE, 
which is too late (i.e. the local node has already sent an "incorrect" 
auto-join). In the first scenario, based on whether the `UpdateVoterSet` timer 
is expired or not, the follower will either send an "incorrect" 
AddVoterRequest, or send a FetchRequest (this will lead to the correct state 
assuming we get a response). 
   
   IMO this is the harder case to handle, and handling this on the leader side 
via `KAFKA-19933` is kind of "wrong" too. I think the observer needs to look at 
its fetch timer, and if that is expired, sending a FETCH takes priority over 
running the auto-join algorithm. That way, long-partitioned followers will 
fetch first, and avoid the above case. The proposed fix here is to add a `&& 
!hasFetchTimeoutExpired()` to `shouldSendAddOrRemoveVoter`.
   
   I think my previous assertion about the conditions of transitioning from 
`HAS_NOT_JOINED -> HAS_JOINED` are incorrect. If the local node C, which 
currently believes the voter set as (A,B) and is in `HAS_NOT_JOINED` state, is 
so far behind it requires two fetches, and the first fetch does not contain any 
new VotersRecords, but the second fetch would contain the records (A,B,C) and 
(A,B), local node C would still try to auto-join "incorrectly"...
   
   We might need to add another clause in `shouldSendAddOrRemoveVoter`: `&& 
local node LEO >= leader HWM`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to