I don't want to endorse this use of Kafka, but assuming you can give your
message unique identifiers, I believe using log compaction will keep all
unique messages forever. You can read about how consumer offsets stored in
Kafka are managed using a compacted topic here:
http://kafka.apache.org/documentation.html#distributionimpl  In that case,
the consumer group id+topic+partition forms a unique message id and the
brokers read that topic on startup into the offsets cache (and take updates
to the offsets cache via the same topic.) If you have a finite, smallish
data set that you want indexed in multiple systems, that might be a good
approach.

If your data can grow without bound, it doesn't seem to me like Kafka is a
good choice? Even with compaction, you will still have to sequentially read
it all, message by message, to get it into a different system. As far as I
know, there is no lookup by id, and even going to a specific date is a
manual O(log n) process.

(warning: I'm just another user, so I may have a few things wrong.)


On Fri, Jul 10, 2015 at 3:47 AM Daniel Schierbeck <
daniel.schierb...@gmail.com> wrote:

> I'd like to use Kafka as a persistent store – sort of as an alternative to
> HDFS. The idea is that I'd load the data into various other systems in
> order to solve specific needs such as full-text search, analytics, indexing
> by various attributes, etc. I'd like to keep a single source of truth,
> however.
>
> I'm struggling a bit to understand how I can configure a topic to retain
> messages indefinitely. I want to make sure that my data isn't deleted. Is
> there a guide to configuring Kafka like this?
>

Reply via email to