[ https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419984#comment-15419984 ]
Flavio Junqueira commented on KAFKA-1211: ----------------------------------------- [~junrao] bq. the leader needs to first wait for the follower to receive a message before it can advance the last committed offset. makes sense bq. it can propagate the last committed offset to the follower makes sense bq. the last committed offset in the follower is always behind that in the leader makes sense, it is either equal or behind, never ahead. bq. Since the follower truncates based on the local last committed offset, it's possible for the follower to truncate messages that are already committed by the leader. I'm not sure why we are doing this. A follower can't truncate until it hears from the leader upon recovery, it shouldn't truncate based on its local last committed offset. > Hold the produce request with ack > 1 in purgatory until replicas' HW has > larger than the produce offset > -------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1211 > URL: https://issues.apache.org/jira/browse/KAFKA-1211 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Fix For: 0.11.0.0 > > > Today during leader failover we will have a weakness period when the > followers truncate their data before fetching from the new leader, i.e., > number of in-sync replicas is just 1. If during this time the leader has also > failed then produce requests with ack >1 that have get responded will still > be lost. To avoid this scenario we would prefer to hold the produce request > in purgatory until replica's HW has larger than the offset instead of just > their end-of-log offsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)