Re: kafka java api (written in 100% clojure)

Gerrit Jansen van Vuuren Mon, 13 Oct 2014 02:01:37 -0700

Hi Daniel,

At the moment redis is a spof in the architecture, but you can setup
replication and I'm seriously looking into using redis cluster to eliminate
this.
   Some docs that point to this are:
   http://redis.io/topics/cluster-tutorial
   http://redis.io/topics/sentinel

Consumer:

Consumption is plit into logical work units by default 10K offsets.

If redis fails, the messages currently read will all be consumed, while the
redis connection threads go mad trying to reconnect.
No data loss should occur (albeit I'm still setting up scenarios in which I
will test this).
The consumer threads read work assignments from a redis list
using brpoplpush (see http://redis.io/commands/BRPOPLPUSH) and with this
pushes
the current working unit into a "working" queue, if any errors are
encountered while consuming the current work unit will not be marked done,
on recovery (startup)
these "working" queues are scanned and any work units found are placed back
onto the primary queue ready for consumption.

If an error message from kafka is received e.g error-code 6, the consumer
will try to recreate the metadata for a work-unit, while this is done the
work-unit will reside on
a "working" queue.

Another idea I'm going to play with is disabling consumption per
machine/topic to allow a machine to completely drain its messages in a
normal operational way,
so that safe shutdown can be scheduled for any machine without having to
rely on multiple threads synching etc.

Producer:

All sends are async and a broker is randomly selected.
The producer has a retry cache (implemented using http://www.mapdb.org/),
if any failure during sending (network or connection errors)  the message
is saved and retried.
If acks==1 and a failure response from kafka is received, the message is
retried N times.

The low-level producer api allows you to by-pass all this logic and send to
any producer you want, the message send is still async but you can
wait/block on a response from the server
and then react as required.

Auditing:

There is an event channel (clojure.core.async) that receives work-units
consumed with a status field for errors.
This can be used to auditing consumers, I normally send this data to a
riemann instance, or just to disk.

Redis is fast and stable and I've found that even running it on a mixed
service box (e.g along side mysql etc) works fine, but this doesn't mean
that the physical hardware underneath can't fail.
In my case with 2 years of production use of redis I've only had one outage
(due to hardware failure).

Hope this answers some of your questions.
Any ideas of improving this is welcome and feel free to contribute and/or
create issue tickets with possible features ;)

Regards,

On Mon, Oct 13, 2014 at 9:58 AM, Daniel Compton <d...@danielcompton.net>
wrote:

> Hi Gerrit
>
> Thanks for your contribution, I'm sure everyone here appreciates it,
> especially Clojure developers like myself. I do have one question: what are
> the guarantees you offer to users of your library under failures,
> particularly when Redis fails?
>
> --
> Daniel
>
> > On 13/10/2014, at 10:22 am, Gerrit Jansen van Vuuren <
> gerrit...@gmail.com> wrote:
> >
> > Hi,
> >
> > Just thought I'll put this out for the kafka community to see (if anyone
> > finds it useful great!!).
> >
> > Kafka-fast is 100% pure clojure implementation for kafka, but not just
> > meant for clojure because it has a Java API wrapper that can be used from
> > Java, Groovy, JRuby or Scala.
> >
> > This library does not wrap scala instead it directly communicates with
> the
> > kafka cluster.
> >
> > It also uses redis for offsets instead of zookeeper and removes the one
> > consumer per partition limit that all other kafka libraries have, by
> diving
> > offsets into logical work units and running consumption via a queue(list)
> > in redis.
> >
> > https://github.com/gerritjvv/kafka-fast
> >
> > Any constructive critique is more than welcome :)
>

Re: kafka java api (written in 100% clojure)

Reply via email to