Re: Arguments for Kafka over RabbitMQ ?

Tim Watson Tue, 11 Jun 2013 07:21:11 -0700

Hi Jonathan,

So, thanks for replying - that's all useful info.

On 10 Jun 2013, at 14:19, Jonathan Hodges wrote:
> Kafka has a configurable rolling window of time it keeps the messages per
> topic.  The default is 7 days and after this time the messages are removed
> from disk by the broker.
> Correct, the consumers maintain their own state via what are known as
> offsets.  Also true that when producers/consumers contact the broker there
> is a random seek to the start of the offset, but the majority of access
> patterns are linear.
> 

So, just to be clear, the distinction that has been raised on this thread is 
only part of the story, viz the difference in rates between RabbitMQ and Kafka. 
Essentially, these two systems are performing completely different tasks, since 
in RabbitMQ, the concept of a long-term persistent topic whose entries are 
removed solely based on expiration policy is somewhat alien. RabbitMQ will 
delete messages from its message store as soon as a relevant consumer has seen 
and ACK'ed them, which *requires* tracking consumer state in the broker. I 
suspect this was your (earlier) point about Kafka /not/ trying to be a general 
purpose message broker, but having an architecture that is highly tuned to a 
specific set of usage patterns. 

>> As you can see in the last graph of 10 million messages which is less than
>> a GB on disk, the Rabbit throughput is capped around 10k/sec.  Beyond
>> throughput, with the pending release of 0.8, Kafka will also have
>> advantages around message guarantees and durability.
>> 
> 
[snip]
> Correct with 0.8 Kafka will have similar options like Rabbit fsync
> configuration option.

Right, but just to be clear, unless Kafka starts to fsync for every single 
published message, you are /not/ going to offer the same guarantee. In this 
respect, rabbit is clearly putting safety above performance when that's what 
users ask it for, which is fine for some cases and not for others. By way of 
example, if you're using producer/publisher confirms with RabbitMQ, the broker 
will not ACK receipt of a message until (a) it has been fsync'ed to disk and 
(b) if the queue is mirrored, each mirror has acknowledged receipt of the 
message. Again, unless you're fsync-ing to disk on each publish, the guarantees 
will be different - and rightly so, since you can deal with re-publishing and 
de-duplication quite happily in a system that's dealing with a 7-day sliding 
window of data and thus ensuring throughput is more useful (in that case) than 
avoiding data loss on the server.

Of course, architecturally, fsync-ing very regularly will kill the benefits 
that mmap combined with sendfile give you, since relying on the kernel's paging 
/ caching capabilities is the whole point of doing that. That's not intended to 
be a criticism btw, just an observation about the distinction between the two 
system's differing approaches.

>  Messages have always had ordering guarantees, but
> with 0.8 there is the notion of topic replicas similar to replication
> factor in Hadoop or Cassandra.
> 
> http://www.slideshare.net/junrao/kafka-replication-apachecon2013
> 
> With configuration you can tradeoff latency for durability with 3 options.
>  - Producer receives no acks (no network delay)
>  - Producer waits for ack from broker leader (1 network roundtrip)
>  - Producer waits for quorum ack (2 network roundtrips)
> 

Sounds very interesting, I'll take a look.

> With the combination of quorum commits and consumers managing state you can
> get much closer to exactly once guarantees i.e. the consumers can manage
> their consumption state as well as the consumed messages in the same
> transaction.
> 

Hmn. This idea (of exactly once delivery) has been long debated in the rabbit 
community. For example 
http://rabbitmq.1065348.n5.nabble.com/Exactly-Once-Delivery-td16826.html covers 
a number of objections presented to doing this, though again, since Kafka is 
addressing a different problem space, perhaps the constraints differ somewhat.

Cheers,
Tim

Re: Arguments for Kafka over RabbitMQ ?

Reply via email to