[ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083075#comment-14083075
 ] 

Jay Kreps commented on KAFKA-1555:
----------------------------------

I think we may be confusing a few things:
1. The guarantee we want to provide
2. The way we express this in the protocol
3. The way the user expresses their intent in config

-2 obviously makes no sense as a config. It is a bit of a hack as part of the 
protocol, but maybe you could argue for it. 

In terms of the behavior I understand -2 to be the equivalent of -1 and also 2.

The best way to do this would be to have a separate min.isr parameter. This 
could be either a topic-level config (with global default), or it could be a 
new parameter in the protocol. I don't think we want to try to cram it into an 
existing field, although I see why that is convenient.

One concern I have is that (whether we cram these two concepts together or not 
in the protocol or not), having two knobs makes it increasingly nuanced for 
users to express their intention around durability. Having previous had 
experience with dynamo style R+W settings my experience is it mostly just 
confuses people.

An alternative might be to just change the behavior of acks=2. We have long 
felt that 0,1, and "all in-sync" are really the settings that make sense and we 
just allowed 2,3, etc for completeness. However since there is no hard 
guarantee for 2 it is pretty hard to think of a time as a user where you would 
actually want 2 acknowledgements (i.e. you are okay potentially losing data but 
did enough empirical statistics to determine that the probability of loss at 1 
was too high). Basically this use case doesn't really exist. If so we should 
just change 2 to mean fully committed and |ISR| >= 2.

> provide strong consistency with reasonable availability
> -------------------------------------------------------
>
>                 Key: KAFKA-1555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.8.1.1
>            Reporter: Jiang Wu
>            Assignee: Neha Narkhede
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to