[
https://issues.apache.org/jira/browse/KAFKA-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263042#comment-17263042
]
Jose Armando Garcia Sancio commented on KAFKA-12181:
----------------------------------------------------
This analysis makes sense to me. I think we can relax the fetch offset
validation per replica. As you point out the leader needs to make sure "that
its high watermark does not go backwards". The {{LeaderState}} implementation
already makes sure that high-watermark is monotonically increasing.
[https://github.com/apache/kafka/blob/aedb53a4e6313398ed9626fbd2fbd7106e2ce94e/raft/src/main/java/org/apache/kafka/raft/LeaderState.java#L118-L120]
The {{KafkRaftClient}} also stores the high-watermark in the {{FollowerState}}
and {{CandidateState}}. To avoid having the state machine reload it's state
after becoming leader, we need to extend the implementation of
{{QuorumState::transitionToLeader}} such that the candidate's high-watermark is
set as the "minimum" leader high-watermark.
> Loosen monotonic fetch offset validation by raft leader
> -------------------------------------------------------
>
> Key: KAFKA-12181
> URL: https://issues.apache.org/jira/browse/KAFKA-12181
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Jason Gustafson
> Assignee: Jason Gustafson
> Priority: Major
>
> Currently in the Raft's leader implementation, we validate that follower
> fetch offsets increase monotonically. This protects the guarantees that Raft
> provides since a non-monotonic update means that the follower has lost
> committed data, which may or may not result in data loss. It depends whether
> the update also causes a non-monotonic update to the high watermark. If the
> fetch is from an observer, no harm done since observers do not affect the
> high watermark. If the fetch is from a voter and a majority of nodes
> (excluding the fetcher) have offsets larger than or equal to the high
> watermark, also no harm done. It's easy to check for these cases and log a
> warning instead of raising an error.
> The question then is what to do if we get a voter fetch which does cause the
> high watermark to regress? The problem is that there are some scenarios where
> data loss might be unavoidable. For example, a follower's disk might become
> corrupt and ultimately get replaced. Often the damage is already done by the
> time we get the Fetch request with the non-monotonic offset, so the stricter
> validation in fact just prevents recovery.
> It's worth noting also that the stricter validation by the leader cannot be
> relied on to detect data loss. It could be the case that a recovered voter
> restarts in the middle of an election. There is no general way that I'm aware
> of that lets us detect when a voter has lost previously committed data.
> With all of this mind, my conclusion is that it makes sense to loosen the
> validation in fetches. The leader can still ensure that its high watermark
> does not go backwards and we can still log a warning, but it should not
> prevent replicas from catching up after hard failures with disk state loss.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)