ahuang98 commented on code in PR #18240:
URL: https://github.com/apache/kafka/pull/18240#discussion_r1899341945


##########
raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java:
##########
@@ -2935,14 +3014,18 @@ private long pollResigned(long currentTimeMs) {
             // until either the shutdown expires or an election bumps the epoch
             stateTimeoutMs = shutdown.remainingTimeMs();
         } else if (state.hasElectionTimeoutExpired(currentTimeMs)) {
-            if (quorum.isVoter()) {
-                transitionToCandidate(currentTimeMs);
-            } else {
+//            if (quorum.isVoter()) {
+                // canElectNewLeaderAfterOldLeaderPartitioned fails if we do 
not bump epoch since it is possible
+                // that the replica ends up as follower in the same epoch.
+                // resigned(leaderId=local) -> prospective(leaderId=local) -> 
follower(leaderId=local) which is illegal
+//                transitionToProspective(quorum.epoch() + 1, currentTimeMs);
+//                transitionToCandidate(currentTimeMs);
+//            } else {

Review Comment:
   the existing raft event simulation tests picked up on a new bug in 
pollResigned - if we simply replace the transitionToCandidate(currentTimeMs) 
with transitionToProspective(currentTimeMs), a cordoned leader in epoch 5 could 
resign in epoch 5, transition to prospective in epoch 5 (with 
leaderId=localId), fail election and then attempt to become follower of itself 
in epoch 5. 
   
   there are a few alternatives which have their pros/cons
   - resigned voter in epoch X should transition to prospective in epoch X+1 
       - cons: need to create a special code path just for this case to allow 
becoming prospective in epoch+1 (would also add trivial complexity for 
determining if votedKey or leaderId should be kept from prior transition). 
transitioning to prospective in epoch + 1 is almost as disruptive as 
transitioning directly to candidate since it involves an epoch bump
       - pro: probably the option which follows intentions of past logic most 
closely
   - resigned voter in epoch X should simply transition to unattached in epoch 
X+1 (current version)
       - con: resigned replica has to wait two election timeouts after 
resignation to become prospective
       - pro: simplified logic. unless this is the only replica eligible for 
leadership in the quorum (e.g. due to network partitioning), the impact of 
waiting two election timeouts after resignation is small - all other replicas 
should be starting their own elections within a single fetch timeout/election 
timeout 
   - resigned voter in epoch X instead waits a smaller backoffTimeMs before 
transitioning to unattached in epoch X+1
       - con: scope creep, additional changes to resignedState
       - pro: resigned voter waits less time before becoming eligible to start 
a new election.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to