[jira] [Comment Edited] (KAFKA-1211) Hold the produce request with ack > 1 in purgatory until replicas' HW has larger than the produce offset

Flavio Junqueira (JIRA) Tue, 02 Aug 2016 09:16:34 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404245#comment-15404245
 ]


Flavio Junqueira edited comment on KAFKA-1211 at 8/2/16 4:14 PM:
-----------------------------------------------------------------

[~junrao] let me ask a few clarification questions.

# Is it right that the scenarios described here do not affect the cases in 
which min isr > 1 and unclean leader election is disabled? If min isr is 
greater than 1 and the leader is always coming from the latest isr, then the 
leader can either truncate the followers or have them fetch the missing log 
suffix.
# The main goal of the proposal is to have replicas in a lossy configuration 
(e.g. min isr = 1, unclean leader election enabled) a leader and a follower 
converging to a common prefix by choosing an offset based on a common 
generation. The chosen generation is the largest generation in common between 
the two replicas. Is it right?
# How do we guarantee that the generation id is unique, by using zookeeper 
versions?
# I think there is a potential race between updating the 
leader-generation-checkpoint file and appending the first message of the 
generation. We might be better off rolling the log segment file and having the 
generation being part of the log segment file name. This way when we start a 
new generation, we also start a new file and we know precisely when a message 
from that generation has been appended.
# Let's consider a scenario with 3 servers A B C. I'm again assuming that it is 
ok to have a single server up to ack requests. Say we have the following 
execution:

||Generation||A||B||C||
|1| |m1|m1|
| | |m2|m2|
|2|m3| | |
| |m4| | |

Say that now A and B start generation 3. They have no generation in common, so 
they start from zero, dropping m1 and m2. Is that right? If later on C joins A 
and B, then it will also drop m1 and m2, right? Given that the configuration is 
lossy, it doesn't wrong to do it as all we are trying to do is to converge to a 
consistent state. 


was (Author: fpj):
[~junrao] let me ask a few clarification questions.

# Is it right that the scenarios described here do not affect the cases in 
which min isr > 1 and unclean leader election is disabled? If min isr is 
greater than 1 and the leader is always coming from the latest isr, then the 
leader can either truncate the followers or have them fetch the missing log 
suffix.
# The main goal of the proposal is to have replicas in a lossy configuration 
(e.g. min isr = 1, unclean leader election enabled) a leader and a follower 
converging to a common prefix by choosing an offset based on a common 
generation. The chosen generation is the largest generation in common between 
the two replicas. Is it right?
# How do we guarantee that the generation id is unique, by using zookeeper 
versions?
# I think there is a potential race between updating the 
leader-generation-checkpoint file and appending the first message of the 
generation. We might be better off rolling the log segment file and having the 
generation being part of the log segment file name. This way when we start a 
new generation, we also start a new file and we know precisely when a message 
from that generation has been appended.
# Let's consider a scenario with 3 servers A B C. I'm again assuming that it is 
ok to have a single server up to ack requests. Say we have the following 
execution:

{noformat}
Generation                 A                            B                     C
1                                                              m1               
   m1
                                                                m2              
    m2
2                                m3
                                  m4
{noformat}

Say that now A and B start generation 3. They have no generation in common, so 
the start from zero, dropping m1 and m2. Is that right? If later on C joins A 
and B, then it will also drop m1 and m2, right? Given that the configuration is 
lossy, it doesn't wrong to do it as all we are trying to do is to converge to a 
consistent state. 

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset
> --------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1211
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1211
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>             Fix For: 0.11.0.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-1211) Hold the produce request with ack > 1 in purgatory until replicas' HW has larger than the produce offset

Reply via email to