[ https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406915#comment-15406915 ]
Jun Rao commented on KAFKA-1211: -------------------------------- [~fpj], very good questions. 1. Yes, the idea is for the follower to copy LGS from the leader. About the possibility of leading to an inconsistent state. We just need to make sure the log is consistent with respect to the local leader-generation-checkpoint file up to the log end offset. One potential issue with the current proposal is when the follower truncates the file and then flushes the checkpoint file. If the follower crashes at this point and the truncation hasn't been flushed, we may treat some of the messages after the truncation point to be in a wrong leader generation. To fix that, we can change the protocol a bit. The basic idea is that the follower will never flush the checkpoint ahead of the log. Specially, when the follower gets the LGS from the leader, it stores it in memory. After truncation, the follower only flushes the prefix of LGS whose start offset is up to the log end offset. As the follower starts fetching data, everytime the fetched messages cross the leader generation boundary (according to the cached LGS), the follower will add a new lead generation entry to the checkpoint file and flushes it. 2. LLG doesn't have to be persisted and only needs to be cached in memory. The idea of LLG is really to detect any leader generation changes since the follower issued the RetreiveLeaderGeneration request. Once this is detected, the follower can handle it properly. If the follower crashes and restarts, it can always re-get the LLG from the current leader. > Hold the produce request with ack > 1 in purgatory until replicas' HW has > larger than the produce offset > -------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1211 > URL: https://issues.apache.org/jira/browse/KAFKA-1211 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Fix For: 0.11.0.0 > > > Today during leader failover we will have a weakness period when the > followers truncate their data before fetching from the new leader, i.e., > number of in-sync replicas is just 1. If during this time the leader has also > failed then produce requests with ack >1 that have get responded will still > be lost. To avoid this scenario we would prefer to hold the produce request > in purgatory until replica's HW has larger than the offset instead of just > their end-of-log offsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)