I don't want to endorse this use of Kafka, but assuming you can give your message unique identifiers, I believe using log compaction will keep all unique messages forever. You can read about how consumer offsets stored in Kafka are managed using a compacted topic here: http://kafka.apache.org/documentation.html#distributionimpl In that case, the consumer group id+topic+partition forms a unique message id and the brokers read that topic on startup into the offsets cache (and take updates to the offsets cache via the same topic.) If you have a finite, smallish data set that you want indexed in multiple systems, that might be a good approach.
If your data can grow without bound, it doesn't seem to me like Kafka is a good choice? Even with compaction, you will still have to sequentially read it all, message by message, to get it into a different system. As far as I know, there is no lookup by id, and even going to a specific date is a manual O(log n) process. (warning: I'm just another user, so I may have a few things wrong.) On Fri, Jul 10, 2015 at 3:47 AM Daniel Schierbeck < daniel.schierb...@gmail.com> wrote: > I'd like to use Kafka as a persistent store – sort of as an alternative to > HDFS. The idea is that I'd load the data into various other systems in > order to solve specific needs such as full-text search, analytics, indexing > by various attributes, etc. I'd like to keep a single source of truth, > however. > > I'm struggling a bit to understand how I can configure a topic to retain > messages indefinitely. I want to make sure that my data isn't deleted. Is > there a guide to configuring Kafka like this? >