[ https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404245#comment-15404245 ]
Flavio Junqueira edited comment on KAFKA-1211 at 8/2/16 4:14 PM: ----------------------------------------------------------------- [~junrao] let me ask a few clarification questions. # Is it right that the scenarios described here do not affect the cases in which min isr > 1 and unclean leader election is disabled? If min isr is greater than 1 and the leader is always coming from the latest isr, then the leader can either truncate the followers or have them fetch the missing log suffix. # The main goal of the proposal is to have replicas in a lossy configuration (e.g. min isr = 1, unclean leader election enabled) a leader and a follower converging to a common prefix by choosing an offset based on a common generation. The chosen generation is the largest generation in common between the two replicas. Is it right? # How do we guarantee that the generation id is unique, by using zookeeper versions? # I think there is a potential race between updating the leader-generation-checkpoint file and appending the first message of the generation. We might be better off rolling the log segment file and having the generation being part of the log segment file name. This way when we start a new generation, we also start a new file and we know precisely when a message from that generation has been appended. # Let's consider a scenario with 3 servers A B C. I'm again assuming that it is ok to have a single server up to ack requests. Say we have the following execution: ||Generation||A||B||C|| |1| |m1|m1| | | |m2|m2| |2|m3| | | | |m4| | | Say that now A and B start generation 3. They have no generation in common, so they start from zero, dropping m1 and m2. Is that right? If later on C joins A and B, then it will also drop m1 and m2, right? Given that the configuration is lossy, it doesn't wrong to do it as all we are trying to do is to converge to a consistent state. was (Author: fpj): [~junrao] let me ask a few clarification questions. # Is it right that the scenarios described here do not affect the cases in which min isr > 1 and unclean leader election is disabled? If min isr is greater than 1 and the leader is always coming from the latest isr, then the leader can either truncate the followers or have them fetch the missing log suffix. # The main goal of the proposal is to have replicas in a lossy configuration (e.g. min isr = 1, unclean leader election enabled) a leader and a follower converging to a common prefix by choosing an offset based on a common generation. The chosen generation is the largest generation in common between the two replicas. Is it right? # How do we guarantee that the generation id is unique, by using zookeeper versions? # I think there is a potential race between updating the leader-generation-checkpoint file and appending the first message of the generation. We might be better off rolling the log segment file and having the generation being part of the log segment file name. This way when we start a new generation, we also start a new file and we know precisely when a message from that generation has been appended. # Let's consider a scenario with 3 servers A B C. I'm again assuming that it is ok to have a single server up to ack requests. Say we have the following execution: {noformat} Generation A B C 1 m1 m1 m2 m2 2 m3 m4 {noformat} Say that now A and B start generation 3. They have no generation in common, so the start from zero, dropping m1 and m2. Is that right? If later on C joins A and B, then it will also drop m1 and m2, right? Given that the configuration is lossy, it doesn't wrong to do it as all we are trying to do is to converge to a consistent state. > Hold the produce request with ack > 1 in purgatory until replicas' HW has > larger than the produce offset > -------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1211 > URL: https://issues.apache.org/jira/browse/KAFKA-1211 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Fix For: 0.11.0.0 > > > Today during leader failover we will have a weakness period when the > followers truncate their data before fetching from the new leader, i.e., > number of in-sync replicas is just 1. If during this time the leader has also > failed then produce requests with ack >1 that have get responded will still > be lost. To avoid this scenario we would prefer to hold the produce request > in purgatory until replica's HW has larger than the offset instead of just > their end-of-log offsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)