On 7 Mar 2014, at 14:11, "Maier, Dr. Andreas" <andreas.ma...@asideas.de> wrote:
>> In your case, it sounds like time-based retention with a fairly long
>> retention period is the way to go. You could potentially store the
>> offsets of messages to retry in a separate Kafka topic.
> 
> I was also thinking about doing that. However, what do I do, if I have
> again some errors when processing the offsets from that Kafka topic?
> Since I cannot delete the offsets of messages from the Kafka topic that
> have been processed successfully, I would have to create another
> Kafka topic to again store the remaining offsets and then maybe another
> one and then another on and so on.

You might be interested to have a look at what Samza does: 
http://samza.incubator.apache.org/learn/documentation/0.7.0/ -- it's a stream 
processing framework that builds on Kafka's features. It still processes 
messages sequentially per partition, so it doesn't do the per-message retry 
that you describe, but it does use a separate Kafka topic for checkpointing 
state and recovering from failure. (It doesn't require a cascade of topics.)

> That seems awkward to me.
> Wouldn't it be better to simply have a mutable list of offsets, read from
> that list and if a message was successfully processed,
> remove the offset from the list. By that one could immediately see from
> the length of the list how many messages still needs to be processed.
> Since Kafka topics are append only they don't seem to be a good fit for
> this kind of logic.

Indeed. If you want per-message acknowledgement and redelivery, perhaps 
something like RabbitMQ or ActiveMQ is a better fit for your use case. Kafka's 
design is optimised for very high-throughput sequential processing of messages, 
whereas RabbitMQ is better for "job queue" use cases where you want to retry 
individual messages out-of-order.

Martin

Reply via email to