Hi Daniel, At the moment redis is a spof in the architecture, but you can setup replication and I'm seriously looking into using redis cluster to eliminate this. Some docs that point to this are: http://redis.io/topics/cluster-tutorial http://redis.io/topics/sentinel
Consumer: Consumption is plit into logical work units by default 10K offsets. If redis fails, the messages currently read will all be consumed, while the redis connection threads go mad trying to reconnect. No data loss should occur (albeit I'm still setting up scenarios in which I will test this). The consumer threads read work assignments from a redis list using brpoplpush (see http://redis.io/commands/BRPOPLPUSH) and with this pushes the current working unit into a "working" queue, if any errors are encountered while consuming the current work unit will not be marked done, on recovery (startup) these "working" queues are scanned and any work units found are placed back onto the primary queue ready for consumption. If an error message from kafka is received e.g error-code 6, the consumer will try to recreate the metadata for a work-unit, while this is done the work-unit will reside on a "working" queue. Another idea I'm going to play with is disabling consumption per machine/topic to allow a machine to completely drain its messages in a normal operational way, so that safe shutdown can be scheduled for any machine without having to rely on multiple threads synching etc. Producer: All sends are async and a broker is randomly selected. The producer has a retry cache (implemented using http://www.mapdb.org/), if any failure during sending (network or connection errors) the message is saved and retried. If acks==1 and a failure response from kafka is received, the message is retried N times. The low-level producer api allows you to by-pass all this logic and send to any producer you want, the message send is still async but you can wait/block on a response from the server and then react as required. Auditing: There is an event channel (clojure.core.async) that receives work-units consumed with a status field for errors. This can be used to auditing consumers, I normally send this data to a riemann instance, or just to disk. Redis is fast and stable and I've found that even running it on a mixed service box (e.g along side mysql etc) works fine, but this doesn't mean that the physical hardware underneath can't fail. In my case with 2 years of production use of redis I've only had one outage (due to hardware failure). Hope this answers some of your questions. Any ideas of improving this is welcome and feel free to contribute and/or create issue tickets with possible features ;) Regards, On Mon, Oct 13, 2014 at 9:58 AM, Daniel Compton <d...@danielcompton.net> wrote: > Hi Gerrit > > Thanks for your contribution, I'm sure everyone here appreciates it, > especially Clojure developers like myself. I do have one question: what are > the guarantees you offer to users of your library under failures, > particularly when Redis fails? > > -- > Daniel > > > On 13/10/2014, at 10:22 am, Gerrit Jansen van Vuuren < > gerrit...@gmail.com> wrote: > > > > Hi, > > > > Just thought I'll put this out for the kafka community to see (if anyone > > finds it useful great!!). > > > > Kafka-fast is 100% pure clojure implementation for kafka, but not just > > meant for clojure because it has a Java API wrapper that can be used from > > Java, Groovy, JRuby or Scala. > > > > This library does not wrap scala instead it directly communicates with > the > > kafka cluster. > > > > It also uses redis for offsets instead of zookeeper and removes the one > > consumer per partition limit that all other kafka libraries have, by > diving > > offsets into logical work units and running consumption via a queue(list) > > in redis. > > > > https://github.com/gerritjvv/kafka-fast > > > > Any constructive critique is more than welcome :) >