Looks great, Gwen. I've added a few comments to the ticket. On Mon, Oct 20, 2014 at 2:32 PM, Gwen Shapira <gshap...@cloudera.com> wrote:
> Hi Kyle, > > I added new documentation, which will hopefully help. Please take a look > here: > https://issues.apache.org/jira/browse/KAFKA-1555 > > I've heard rumors that you are very very good at documenting, so I'm > looking forward to your comments. > > Note that I'm completely ignoring the acks>1 case since we are about > to remove it. > > Gwen > > On Wed, Oct 15, 2014 at 1:21 PM, Kyle Banker <kyleban...@gmail.com> wrote: > > Thanks very much for these clarifications, Gwen. > > > > I'd recommend modifying the following phrase describing "acks=-1": > > > > "This option provides the best durability, we guarantee that no messages > > will be lost as long as at least one in sync replica remains." > > > > The "as long as at least one in sync replica remains" is such a huge > > caveat. It should be noted that "acks=-1" provides no actual durability > > guarantees unless min.isr is also used to specify a majority of replicas. > > > > In addition, I was curious if you might comment on my other recent > posting > > "Consistency and Availability on Node Failures" and possibly add this > > scenario to the docs. With acks=-1 and min.isr=2 and a 3-replica topic > in a > > 12-node Kafka cluster, there's a relatively high probability that losing > 2 > > nodes from this cluster will result in an inability to write to the > cluster. > > > > On Tue, Oct 14, 2014 at 4:50 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > > > >> ack = 2 *will* throw an exception when there's only one node in ISR. > >> > >> The problem with ack=2 is that if you have 3 replicas and you got acks > >> from 2 of them, the one replica which did not get the message can > >> still be in ISR and get elected as leader, leading for a loss of the > >> message. If you specify ack=3, you can't tolerate the failure of a > >> single replica. Not amazing either. > >> > >> To makes things even worse, when specifying the number of acks you > >> want, you don't always know how many replicas the topic should have, > >> so its difficult to pick the correct number. > >> > >> acks = -1 solves that problem (since all messages need to get acked by > >> all replicas), but introduces the new problem of not getting an > >> exception if ISR shrank to 1 replica. > >> > >> Thats why the min.isr configuration was added. > >> > >> I hope this clarifies things :) > >> I'm planning to add this to the docs in a day or two, so let me know > >> if there are any additional explanations or scenarios you think we > >> need to include. > >> > >> Gwen > >> > >> On Tue, Oct 14, 2014 at 12:27 PM, Scott Reynolds <sreyno...@twilio.com> > >> wrote: > >> > A question about 0.8.1.1 and acks. I was under the impression that > >> setting > >> > acks to 2 will not throw an exception when there is only one node in > ISR. > >> > Am I incorrect ? Thus the need for min_isr. > >> > > >> > On Tue, Oct 14, 2014 at 11:50 AM, Kyle Banker <kyleban...@gmail.com> > >> wrote: > >> > > >> >> It's quite difficult to infer from the docs the exact techniques > >> required > >> >> to ensure consistency and durability in Kafka. I propose that we add > a > >> doc > >> >> section detailing these techniques. I would be happy to help with > this. > >> >> > >> >> The basic question is this: assuming that I can afford to temporarily > >> halt > >> >> production to Kafka, how do I ensure that no message written to > Kafka is > >> >> ever lost under typical failure scenarios (i.e., the loss of a single > >> >> broker)? > >> >> > >> >> Here's my understanding of this for Kafka v0.8.1.1: > >> >> > >> >> 1. Create a topic with a replication factor of 3. > >> >> 2. Use a sync producer and set acks to 2. (Setting acks to -1 may > >> >> successfully write even in a case where the data is written only to a > >> >> single node). > >> >> > >> >> Even with these two precautions, there's always the possibility of an > >> >> "unclean leader election." Can data loss still occur in this > scenario? > >> Is > >> >> it possible to achieve this level of durability on v0.8.1.1? > >> >> > >> >> In Kafka v0.8.2, in addition to the above: > >> >> > >> >> 3. Ensure that the triple-replicated topic also disallows unclean > leader > >> >> election (https://issues.apache.org/jira/browse/KAFKA-1028). > >> >> > >> >> 4. Set the min.isr value of the producer to 2 and acks to -1 ( > >> >> https://issues.apache.org/jira/browse/KAFKA-1555). The producer will > >> then > >> >> throw an exception if data can't be written to 2 out of 3 nodes. > >> >> > >> >> In addition to producer configuration and usage, there are also > >> monitoring > >> >> and operations considerations for achieving durability and > consistency. > >> As > >> >> those are rather nuanced, it'd probably be easiest to just start > >> iterating > >> >> on a document to flesh those out. > >> >> > >> >> If anyone has any advice on how to better specify this, or how to get > >> >> started on improving the docs, I'm happy to help out. > >> >> > >> >