Since Kafka itself has replication, I'm not sure what HDFS backups would bring – how would you recover from e.g. all Kafka nodes blowing up if you only have an HDFS backup? Why not use MirrorMaker to replicate the cluster to a remote DC, with a process of reversing the direction in case you need to recover?
On Tue, Mar 15, 2016 at 10:07 AM Giidox <a...@marmelandia.com> wrote: > Good points. > > I would backup all partitions to HDFS (etc.), as fast as the data arrives. > In case the Kafka becomes corrupted, the topics can be repopulated from the > backup. In my case, all clients track their own offsets, so they should in > theory be able to continue as if nothing had happened. > > Regenerating (duplicating?) data for non-production environment could then > be done from either production Kafka cluster or from the backup. > > > On 14 Mar 2016, at 12:32, Ben Stopford <b...@confluent.io> wrote: > > > > - Compacted topics provide a useful way to retain meaningful datasets > inside the broker, which don’t grow indefinitely. If you have an > update-in-place use case, where the event sourced approach doesn’t buy you > much, these will keep the reload time down when you regenerate materialised > views. > > - When going down the master data store route a few different problems > may conflate. Disaster recovery, historic backups, regenerating data in non > production environments. > >