Since it is apparently ill-advised to use Kafka for long-term storage I'm investigating ways to archive messages in a standardized way that adds the least possible complexity overhead to consumer applications that may need to process all data ever written to a topic before continuing to stream messages as they're written to Kafka.
My plan is to use e.g. Avro data files to store messages written to partitions in large batches, with the key and value being opaque blobs. The records would include the offset of the message within the partition. I hope to be able to use Consumer Groups to process these files in order to 1. get coordination and partition assignment for free; 2. get error recovery for free; and 3. easily be able to switch over to reading directly from Kafka after the archived messages have been processed. My main concern is that the offset commit API won't accept an offset for a message that has expired. Do you have any experience in building a system such as this? Daniel Schierbeck Senior Staff Engineer @ Zendesk