[ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404404#comment-15404404
 ] 

Jun Rao commented on KAFKA-1211:
--------------------------------

[~fpj], for #1 and #2, there are a couple scenarios that this proposal can fix.
a. The first one is what's described in the original jira. Currently, when the 
follower does truncation, it can truncate some previously committed messages. 
If the follower immediately becomes the leader after truncation, we will lose 
some previously committed messages. This is rare, but if it happens, it's bad. 
The proposal fixes this case by preventing the follower from unnecessarily 
truncating previously committed messages.
b. Another issue is that a portion of the log in different replicas may not 
match in certain failure cases. This can happen when unclean leader election is 
enabled. However, even if unclean leader election is disabled, mis-matching can 
still happen when messages are lost due to power outage (see KAFKA-3919). The 
proposal fixes this issue by making sure that the replicas are always identical.

For #3, the controller increases the leader generation every time the leader 
changes. The latest leader generation is persisted in ZK.

For #4, putting the leader generation in the segment file name is another 
possibility. One concern I had on that approach is dealing with compacted 
topics. After compaction, it's possible there is only a small number (or even 
just a single) messages left in a particular generation. Putting the generation 
id in the segment file name will force us to have tiny segments, which is not 
ideal. About the race condition, even with a separate checkpoint file, we can 
avoid that. The sequencing will be (1) broker receives LeaderAndIsrRequest to 
become leader; (2) broker stops fetching from current leader; (3) no new writes 
can happen to this replica at this point; (4) broker writes the new leader 
generation and log end offset to checkpoint file; (5) broker marks replica as 
leader; (6) new writes can happen to this replica now.

For #5, it depends on who becomes the new leader in that case. If A becomes the 
new leader (generation 3), then B and C will remove m1 and m2 and copy m3 and 
m4 over from A. If B becomes the new leader, A will remove m3 and m4 and copy 
m1 and m2 over from B. In either case, the replicas will be identical.

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset
> --------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1211
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1211
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>             Fix For: 0.11.0.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to