Disclaimer: I only have experience right now with 0.7.2 On Sun, Jul 7, 2013 at 11:35 AM, David James <davidcja...@gmail.com> wrote: > Sorry for the long email, but I've tried to keep it organized, at least. > > "Kafka has a modern cluster-centric design that offers strong > durability and fault-tolerance guarantees." and "Messages are > persisted on disk and replicated within the cluster to prevent data > loss." according to http://kafka.apache.org/. > > I'm trying to understand what this means in some detail. So, two questions. > > 1. Fault-Tolerance > > If a Broker in a Kafka cluster fails (the EC2 instance dies), what > happens? After, let's say I add a new Broker to the cluster (that my > responsibility, not Kafka's). What happens when it rejoins? > > To be more particular, if the cluster consists of a Zookeeper and B > (3, for example) Brokers, can a Kafka system guarantee to tolerate up > to B-1 (2, for example) Broker failures? > > 2. Durability at an application level > > What are the guarantees about durability, at an application level, in > practice? By "application level" I mean guarantees that a produced > message gets consumed and acted upon by an application that uses > Kafka. My understanding at present is that Kafka does not make these > kinds of guarantees because there are no acks. So, it is up to the > application developer to handle it. Is this right?
Yes. > > Here's my understanding: Having messages persisted on disk and > replicated is why Kafka has durability guarantees. But, from an > application perspective, what happens when a consumer pulls a message > but fails before acting on it? That would update the Kafka consumer > offset, right? So, without some thinking and planning ahead on the > Kafka system design, the application's consumers would not have a way > of knowing that a message was not actually processed. Don't update the offset until message has been fully processed. This means you need to build downstream systems that can accept messages being replayed, since a message may be processed but the consumer crash before the offset it updated. Or at least have a process in place to deal with clean-up, in the event of a crash. > > Conclusion / Last Question > > I'm interested in making the chance of message loss minimal, at a > system level. Any pointers on what to read or think about would be > appreciated! > > Thanks! > -David