ahuang98 commented on code in PR #18240: URL: https://github.com/apache/kafka/pull/18240#discussion_r1899341945
########## raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java: ########## @@ -2935,14 +3014,18 @@ private long pollResigned(long currentTimeMs) { // until either the shutdown expires or an election bumps the epoch stateTimeoutMs = shutdown.remainingTimeMs(); } else if (state.hasElectionTimeoutExpired(currentTimeMs)) { - if (quorum.isVoter()) { - transitionToCandidate(currentTimeMs); - } else { +// if (quorum.isVoter()) { + // canElectNewLeaderAfterOldLeaderPartitioned fails if we do not bump epoch since it is possible + // that the replica ends up as follower in the same epoch. + // resigned(leaderId=local) -> prospective(leaderId=local) -> follower(leaderId=local) which is illegal +// transitionToProspective(quorum.epoch() + 1, currentTimeMs); +// transitionToCandidate(currentTimeMs); +// } else { Review Comment: the existing raft event simulation tests picked up on a new bug in pollResigned - if we simply replace the transitionToCandidate(currentTimeMs) with transitionToProspective(currentTimeMs), a cordoned leader in epoch 5 could resign in epoch 5, transition to prospective in epoch 5 (with leaderId=localId), fail election and then attempt to become follower of itself in epoch 5. so far, these are the alternatives which seem reasonable to me: - resigned voter in epoch X should transition to prospective in epoch X+1 - cons: need to create a special code path just for this case to allow becoming prospective in epoch+1 (would also add trivial complexity for determining if votedKey or leaderId should be kept from prior transition). transitioning to prospective in epoch + 1 is almost as disruptive as transitioning directly to candidate since it involves an epoch bump - pro: probably the option which follows intentions of past logic most closely - resigned voter in epoch X should simply transition to unattached in epoch X+1 (current version) - con: resigned replica has to wait two election timeouts after resignation to become prospective - pro: simplified logic. unless this is the only replica eligible for leadership in the quorum (e.g. due to network partitioning), the impact of waiting two election timeouts after resignation is small - all other replicas should be starting their own elections within a single fetch timeout/election timeout - resigned voter in epoch X instead waits a smaller backoffTimeMs before transitioning to unattached in epoch X+1 - con: scope creep - what should this backoff be? additional changes to resignedState - pro: resigned voter waits less time before becoming eligible to start a new election. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org