Sorry for the long email, but I've tried to keep it organized, at least.

"Kafka has a modern cluster-centric design that offers strong
durability and fault-tolerance guarantees." and "Messages are
persisted on disk and replicated within the cluster to prevent data
loss." according to http://kafka.apache.org/.

I'm trying to understand what this means in some detail. So, two questions.

1. Fault-Tolerance

If a Broker in a Kafka cluster fails (the EC2 instance dies), what
happens? After, let's say I add a new Broker to the cluster (that my
responsibility, not Kafka's). What happens when it rejoins?

To be more particular, if the cluster consists of a Zookeeper and B
(3, for example) Brokers, can a Kafka system guarantee to tolerate up
to B-1 (2, for example) Broker failures?

2. Durability at an application level

What are the guarantees about durability, at an application level, in
practice? By "application level" I mean guarantees that a produced
message gets consumed and acted upon by an application that uses
Kafka. My understanding at present is that Kafka does not make these
kinds of guarantees because there are no acks. So, it is up to the
application developer to handle it. Is this right?

Here's my understanding: Having messages persisted on disk and
replicated is why Kafka has durability guarantees. But, from an
application perspective, what happens when a consumer pulls a message
but fails before acting on it? That would update the Kafka consumer
offset, right? So, without some thinking and planning ahead on the
Kafka system design, the application's consumers would not have a way
of knowing that a message was not actually processed.

Conclusion / Last Question

I'm interested in making the chance of message loss minimal, at a
system level. Any pointers on what to read or think about would be
appreciated!

Thanks!
-David

Reply via email to