Werner Daehn created KAFKA-6563:
-----------------------------------

             Summary: Kafka online backup
                 Key: KAFKA-6563
                 URL: https://issues.apache.org/jira/browse/KAFKA-6563
             Project: Kafka
          Issue Type: Improvement
          Components: core
            Reporter: Werner Daehn


If you consider Kafka to be a "database" with "transaction logs", you need a 
way for backup/recovery just like databases do.

The beauty of such a solution would be to enable Kafka for smaller scenarios 
where you do not want to have a large cluster. You could even use a single node 
Kafka. In worst case you lose all data since the backup and you have to ask the 
sources to send that data again - for most that is possible.

 

Currently you have multiple options, none of which are good.
 # Setup Kafka fault tolerant and with replication factors: Needs a larger 
server and does not prevent many types of problems like software bugs, deleting 
a topic by accident,...
 # Mirror Kafka: Very expensive.
 # Shutdown Kafka, disk copy, startup Kafka
 # Add a database before Kafka as a primary persistence: Very very expensive 
and forfeits the idea of Kafka

 

I wonder what really is needed for an online backup strategy. If I am not 
mistaken it is relatively little. 
 * A command that causes Kafka to switch to new files so that the file 
containing all past data do not change any longer.
 * Export of the current Zookeeper values, unless they can be recreated from 
the transaction log files anyhow.
 * Then you can backup the Kafka files
 * A command that tells that the backup is finished to cleanup things.
 * Later a way to merge the recovered backup instance with the Kafka log 
written since then up to a certain point. Example: Backup was taken at 
midnight, delete topic was done a 11:00. You start with the backup, apply the 
logs until 10:59 and then you bring up Kafka fully online again.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to