[jira] [Commented] (KAFKA-12181) Loosen monotonic fetch offset validation by raft leader

Jose Armando Garcia Sancio (Jira) Tue, 12 Jan 2021 11:56:08 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263653#comment-17263653
 ]


Jose Armando Garcia Sancio commented on KAFKA-12181:
----------------------------------------------------

> Currently the leader will not expose the new high watermark until it has 
> reached the LeaderChange message that is written at the start of the leader 
> epoch. 

Yes. This is true. In {{LeaderState}} the initial value is {{Optional.empty()}} 
and we don't update it until the new high watermark is greater than or equal to 
the starting offset of the current epoch:

https://github.com/apache/kafka/blob/7455b701028337b53fe434e4ff16a2881975584e/raft/src/main/java/org/apache/kafka/raft/LeaderState.java#L115

> Loosen monotonic fetch offset validation by raft leader
> -------------------------------------------------------
>
>                 Key: KAFKA-12181
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12181
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> Currently in the Raft's leader implementation, we validate that follower 
> fetch offsets increase monotonically. This protects the guarantees that Raft 
> provides since a non-monotonic update means that the follower has lost 
> committed data, which may or may not result in data loss. It depends whether 
> the update also causes a non-monotonic update to the high watermark. If the 
> fetch is from an observer, no harm done since observers do not affect the 
> high watermark. If the fetch is from a voter and a majority of nodes 
> (excluding the fetcher) have offsets larger than or equal to the high 
> watermark, also no harm done. It's easy to check for these cases and log a 
> warning instead of raising an error.
> The question then is what to do if we get a voter fetch which does cause the 
> high watermark to regress? The problem is that there are some scenarios where 
> data loss might be unavoidable. For example, a follower's disk might become 
> corrupt and ultimately get replaced. Often the damage is already done by the 
> time we get the Fetch request with the non-monotonic offset, so the stricter 
> validation in fact just prevents recovery. 
> It's worth noting also that the stricter validation by the leader cannot be 
> relied on to detect data loss. It could be the case that a recovered voter 
> restarts in the middle of an election. There is no general way that I'm aware 
> of that lets us detect when a voter has lost previously committed data.
> With all of this mind, my conclusion is that it makes sense to loosen the 
> validation in fetches. The leader can still ensure that its high watermark 
> does not go backwards and we can still log a warning, but it should not 
> prevent replicas from catching up after hard failures with disk state loss. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-12181) Loosen monotonic fetch offset validation by raft leader

Reply via email to