[ https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148483#comment-14148483 ]
Jay Kreps commented on KAFKA-1555: ---------------------------------- 1. Yeah this may have been a mistake. I think integers are fine for the protocol. We need not expose them in this way in the client config. 2. Yeah I think what I said wasn't very clear. You are correct that the loss situation is always an unclean election. I think your complaint is that min.isr is hard to reason about. I actually agree. What I was trying to say was that you can't totally prevent data loss either by replication or min.isr because regardless of the number of writes there is some chance of more failures than that. So fundamentally using the number of writes/replicas is a way to increase the probability of durability. Both are hard to reason about but I don't know if min.isr is worse than replication factor in that respect. 3. Which of the four options are you saying you like? 4. Yes, I totally agree. Let me elaborate. I claim there are really only two common cases here (a) you have the ability to block and wait for sufficient available replicas holding onto whatever unwritten data in the meantime, (b) you don't. I think probably the majority of uses are (b), for example any data produced by a web application would be this way. But there are plenty of cases where you can block (a stream processing job reading from an upstream topic, or when replicating data coming out of a database, etc). min.isr only helps the case where you can block. So the only sane way to use min.isr is to also set retries to infinity and keep trying until you are sure the write has succeeded. But if we don't fail fast on the write imagine how this would work in practice. A server fails bringing you below your min.isr setting and for the hour while someone is fixing it your process is sitting there pumping out duplicate writes. This is likely not what anyone would want. But as you say you can't guarantee the data wasn't written because the isr could shrink after the write occurred. This would be rare, but possible. However attempting to fail fast, even though it isn't perfect, fixes the problem issue--it is possible there may be a duplicate but that is true anyway just due to network timeouts, but there won't be a billion duplicates which is what I consider to be the blocker issue. > provide strong consistency with reasonable availability > ------------------------------------------------------- > > Key: KAFKA-1555 > URL: https://issues.apache.org/jira/browse/KAFKA-1555 > Project: Kafka > Issue Type: Improvement > Components: controller > Affects Versions: 0.8.1.1 > Reporter: Jiang Wu > Assignee: Gwen Shapira > Fix For: 0.8.2 > > Attachments: KAFKA-1555.0.patch, KAFKA-1555.1.patch, > KAFKA-1555.2.patch, KAFKA-1555.3.patch, KAFKA-1555.4.patch, KAFKA-1555.5.patch > > > In a mission critical application, we expect a kafka cluster with 3 brokers > can satisfy two requirements: > 1. When 1 broker is down, no message loss or service blocking happens. > 2. In worse cases such as two brokers are down, service can be blocked, but > no message loss happens. > We found that current kafka versoin (0.8.1.1) cannot achieve the requirements > due to its three behaviors: > 1. when choosing a new leader from 2 followers in ISR, the one with less > messages may be chosen as the leader. > 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it > has less messages than the leader. > 3. ISR can contains only 1 broker, therefore acknowledged messages may be > stored in only 1 broker. > The following is an analytical proof. > We consider a cluster with 3 brokers and a topic with 3 replicas, and assume > that at the beginning, all 3 replicas, leader A, followers B and C, are in > sync, i.e., they have the same messages and are all in ISR. > According to the value of request.required.acks (acks for short), there are > the following cases. > 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement. > 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this > time, although C hasn't received m, C is still in ISR. If A is killed, C can > be elected as the new leader, and consumers will miss m. > 3. acks=-1. B and C restart and are removed from ISR. Producer sends a > message m to A, and receives an acknowledgement. Disk failure happens in A > before B and C replicate m. Message m is lost. > In summary, any existing configuration cannot satisfy the requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)