Hi Alyssa, 1. In the schema for VoteRequest and VoteResponse, you are using "boolean" as the type keyword. The correct keyword should be "bool" instead.
2. In the states and state transaction table you have the following entry: > * Candidate transitions to: > * ... > * Prospective: After expiration of the election timeout Can you explain the reason a candidate would transition back to prospective? If a voter transitions to the candidate state it is because the voters don't support KIP-996 or the replica was able to win the majority of the votes at some point in the past. Are we concerned that the network partition might have occurred after the replica has become a candidate? If so, I think we should state this explicitly in the KIP. 3. In the proposed section and state transition section, I think it would be helpful to explicitly state that we have an invariant that only the prospective state can transition to the candidate state. This transition to the candidate state from the prospective state can only happen because the replica won the majority of the votes or there is at least one remote voter that doesn't support pre-vote. 4. I am a bit confused by this paragraph > A candidate will now send a VoteRequest with the PreVote field set to true > and CandidateEpoch set to its [epoch + 1] when its election timeout expires. > If [majority - 1] of VoteResponse grant the vote, the candidate will then > bump its epoch up and send a VoteRequest with PreVote set to false which is > our standard vote that will cause state changes for servers receiving the > request. I am assuming that "candidate" refers to the states enumerated on the table above this quote. If so, I think you mean "prospective" for the first candidate. CandidateEpoch should be ReplicaEpoch. [epoch + 1] should just be epoch. I thought we agreed that replicas will always send their current epoch to the remote replicas. 5. I am a bit confused by this bullet section > true if the server receives less than [majority] VoteResponse with > VoteGranted set to false within [election.timeout.ms + a little randomness] > and the first bullet point does not apply Explanation for why we don't send a standard vote at this point is explained in rejected alternatives. Can we explain this case in plain english? I assume that this case is trying to cover the scenario where the election timer expired but the prospective candidate hasn't received enough votes (granted or rejected) to make a decision if it could win an election. 6. > Yes. If a leader is unable to receive fetch responses from a majority of > servers, it can impede followers that are able to communicate with it from > voting in an eligible leader that can communicate with a majority of the > cluster. In general, leaders don't receive fetch responses. They receive FETCH requests. Did you mean "if a leader is able to send FETCH responses to the majority - 1 of the voters, it can impede fetching voters (followers) from granting their vote to prospective candidates. This should stop prospective candidates from getting enough votes to transition to the candidate state and increase their epoch". 7. > Check Quorum ensures a leader steps down if it is unable to receive fetch > responses from a majority of servers. I think you mean "... if it is unable to receive FETCH requests from the majority - 1 of the voters". 8. At the end of the Proposed changes section you have the following: > The logic now looks like the following for servers receiving VoteRequests > with PreVote set to true: > > When servers receive VoteRequests with the PreVote field set to true, they > will respond with VoteGranted set to > > * true if they are not a Follower and the epoch and offsets in the Pre-Vote > request satisfy the same requirements as a standard vote > * false if they are a Follower or the epoch and end offsets in the Pre-Vote > request do not satisfy the requirements This seems to duplicate the same algorithm that was stated earlier in the section. 9. I don't understand this rejected idea: Sending Standard Votes after failure to win Pre-Vote In your example in the "Disruptive server scenarios" voters 4 and 5 are partitioned from the majority of the voters. We don't want voters 4 and 5 increasing their epoch and transitioning to the candidate state else they would disrupt the quorum established by voters 1, 2 and 3. Thanks, -- -José