[ https://issues.apache.org/jira/browse/KAFKA-13621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534578#comment-17534578 ]
Jose Armando Garcia Sancio commented on KAFKA-13621: ---------------------------------------------------- [~hachikuji] mentioned that after we approve and implement KIP-835 we are guaranteed for the leader to have record appended in every `metadata.monitor.write.interval.ms` (or `controller.monitor.write.interval.ms`). We can use this feature to resign the leader if written records don't commit after a fetch timeout. I think this solution seems reasonable. I have the following questions: # If we should do this in the Controller/metadata module or in the raft module? # How do we handle quorums that are fetching but are slow to catch-up to the high-watermark because they have a small log? > Resign leader on network partition > ---------------------------------- > > Key: KAFKA-13621 > URL: https://issues.apache.org/jira/browse/KAFKA-13621 > Project: Kafka > Issue Type: Sub-task > Reporter: Jose Armando Garcia Sancio > Assignee: Jose Armando Garcia Sancio > Priority: Major > > h1. Motivation > If the current leader A at epoch X gets partition from the rest of the > quorum, quorum voter A will stay leader at epoch X. This happens because > voter A will never receive an request from the rest of the voters increasing > the epoch. These requests that typically increase the epoch of past leaders > are BeginQuorumEpoch and Vote. > In addition if voter A (leader at epoch X) doesn't get partition from the > rest of the brokers (observer in the KRaft protocol) the brokers will never > learn about the new quorum leader. This happens because 1. observers learn > about the leader from the Fetch response and 2. observer send a Fetch request > to a random leader if the Fetch request times out. > Neither of these two scenarios will cause the broker to send a request to a > different voter because the leader at epoch X will never send a different > leader in the response and the broker will never send a Fetch request to a > different voter because the Fetch request will never timeout. > h1. Proposed Changes > In this scenario the A, the leader at epoch X, will stop receiving Fetch > request from the majority of the voters. Voter A should resign as leader if > the Fetch request from the majority of the voters is old enough. A reasonable > value for "old enough" is the Fetch timeout value. -- This message was sent by Atlassian Jira (v8.20.7#820007)