Hi Jonathan, Cheers, Tim
On 10 Jun 2013, at 13:12, Jonathan Hodges wrote: > Actually you don't need 100s GBs to reap the benefits of Kafka over Rabbit. > Because Kafka doesn't centrally maintain state it can always manage higher > message throughput more efficiently than Rabbit even when there is no > messages persisted to disk. > Just out of curiosity, how does Kafka know when to remove/delete messages from disk? Is this just done whenever a messages "falls off the end of the (circular) buffer" or is there more to it than that? Also, when you say that Kafka doesn't centrally maintain state (at all), does that mean clients maintain their view of where (in the server held buffer) they're currently at - kind of client-side cursor to the data? How does this translate into no random I/O - you can't have mapped the entire multi-terrabyte sized store into memory using mmap, so does this simply mean that when that particular client is consuming data, you're relying on the OS to page in the relevant bits of the data store and relying on sendfile (under the covers) to flush that to the socket? Have I understood this correctly? Sorry, BTW, if these are RTFM questions - I saw some bits in the docs, but I must admit I've not trawled the code for answers as yet. > As you can see in the last graph of 10 million messages which is less than > a GB on disk, the Rabbit throughput is capped around 10k/sec. Beyond > throughput, with the pending release of 0.8, Kafka will also have > advantages around message guarantees and durability. > Fascinating. What are those guarantees going to be? One of the reasons Rabbit runs a bit slower - one of several - when persisting data, is that each write it fsync'ed to disk, whereas kafka relies on OS level flushing IIRC, providing a configurable parameter to force a flush after some defined number of messages, so as to avoid too much potential data loss in case of server failure. So in that respect, Rabbit has a highly guarantee of durability in its current incarnation, with the obvious caveats that doing so has an adverse affect on performance. When you say "message guarantees", are we talking about ordering, or delivery, or both? Very interested to hear about those. Cheers, Tim