Werner Daehn created KAFKA-6563: ----------------------------------- Summary: Kafka online backup Key: KAFKA-6563 URL: https://issues.apache.org/jira/browse/KAFKA-6563 Project: Kafka Issue Type: Improvement Components: core Reporter: Werner Daehn
If you consider Kafka to be a "database" with "transaction logs", you need a way for backup/recovery just like databases do. The beauty of such a solution would be to enable Kafka for smaller scenarios where you do not want to have a large cluster. You could even use a single node Kafka. In worst case you lose all data since the backup and you have to ask the sources to send that data again - for most that is possible. Currently you have multiple options, none of which are good. # Setup Kafka fault tolerant and with replication factors: Needs a larger server and does not prevent many types of problems like software bugs, deleting a topic by accident,... # Mirror Kafka: Very expensive. # Shutdown Kafka, disk copy, startup Kafka # Add a database before Kafka as a primary persistence: Very very expensive and forfeits the idea of Kafka I wonder what really is needed for an online backup strategy. If I am not mistaken it is relatively little. * A command that causes Kafka to switch to new files so that the file containing all past data do not change any longer. * Export of the current Zookeeper values, unless they can be recreated from the transaction log files anyhow. * Then you can backup the Kafka files * A command that tells that the backup is finished to cleanup things. * Later a way to merge the recovered backup instance with the Kafka log written since then up to a certain point. Example: Backup was taken at midnight, delete topic was done a 11:00. You start with the backup, apply the logs until 10:59 and then you bring up Kafka fully online again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)