Since it is apparently ill-advised to use Kafka for long-term storage I'm
investigating ways to archive messages in a standardized way that adds the
least possible complexity overhead to consumer applications that may need
to process all data ever written to a topic before continuing to stream
messages as they're written to Kafka.

My plan is to use e.g. Avro data files to store messages written to
partitions in large batches, with the key and value being opaque blobs. The
records would include the offset of the message within the partition.

I hope to be able to use Consumer Groups to process these files in order to

1. get coordination and partition assignment for free;
2. get error recovery for free; and
3. easily be able to switch over to reading directly from Kafka after the
archived messages have been processed.

My main concern is that the offset commit API won't accept an offset for a
message that has expired.

Do you have any experience in building a system such as this?


Daniel Schierbeck
Senior Staff Engineer @ Zendesk

Reply via email to