> I'm not sure what HDFS backups would bring I’m not sure how realistic the threat is, but I was thinking a case in which bug in Kafka corrupts the log files. I would personally sleep better knowing there is a Kafka-independent backup of all data.
> how would you recover from e.g. all Kafka nodes blowing up if you only have > an HDFS backup I’m not sure which problem you see. I am thinking a backup which has a copy of every message and its key, by partition, in offset order. The backup would be appended with new messages as fast as they appear in Kafka. It should be possible to recreate a Kafka cluster by creating the topics and producing messages from the backup in original order into the empty Kafka cluster. In a system in which consumers maintain their own offsets, consumers should in theory see no change after the reconstruction (consumers should be shut down while the recreation is ongoing).