On 7 Mar 2014, at 14:11, "Maier, Dr. Andreas" <andreas.ma...@asideas.de> wrote: >> In your case, it sounds like time-based retention with a fairly long >> retention period is the way to go. You could potentially store the >> offsets of messages to retry in a separate Kafka topic. > > I was also thinking about doing that. However, what do I do, if I have > again some errors when processing the offsets from that Kafka topic? > Since I cannot delete the offsets of messages from the Kafka topic that > have been processed successfully, I would have to create another > Kafka topic to again store the remaining offsets and then maybe another > one and then another on and so on.
You might be interested to have a look at what Samza does: http://samza.incubator.apache.org/learn/documentation/0.7.0/ -- it's a stream processing framework that builds on Kafka's features. It still processes messages sequentially per partition, so it doesn't do the per-message retry that you describe, but it does use a separate Kafka topic for checkpointing state and recovering from failure. (It doesn't require a cascade of topics.) > That seems awkward to me. > Wouldn't it be better to simply have a mutable list of offsets, read from > that list and if a message was successfully processed, > remove the offset from the list. By that one could immediately see from > the length of the list how many messages still needs to be processed. > Since Kafka topics are append only they don't seem to be a good fit for > this kind of logic. Indeed. If you want per-message acknowledgement and redelivery, perhaps something like RabbitMQ or ActiveMQ is a better fit for your use case. Kafka's design is optimised for very high-throughput sequential processing of messages, whereas RabbitMQ is better for "job queue" use cases where you want to retry individual messages out-of-order. Martin