Since Kafka itself has replication, I'm not sure what HDFS backups would
bring – how would you recover from e.g. all Kafka nodes blowing up if you
only have an HDFS backup? Why not use MirrorMaker to replicate the cluster
to a remote DC, with a process of reversing the direction in case you need
to recover?

On Tue, Mar 15, 2016 at 10:07 AM Giidox <a...@marmelandia.com> wrote:

> Good points.
>
> I would backup all partitions to HDFS (etc.), as fast as the data arrives.
> In case the Kafka becomes corrupted, the topics can be repopulated from the
> backup. In my case, all clients track their own offsets, so they should in
> theory be able to continue as if nothing had happened.
>
> Regenerating (duplicating?) data for non-production environment could then
> be done from either production Kafka cluster or from the backup.
>
> > On 14 Mar 2016, at 12:32, Ben Stopford <b...@confluent.io> wrote:
> >
> > - Compacted topics provide a useful way to retain meaningful datasets
> inside the broker, which don’t grow indefinitely. If you have an
> update-in-place use case, where the event sourced approach doesn’t buy you
> much, these will keep the reload time down when you regenerate materialised
> views.
> > - When going down the master data store route a few different problems
> may conflate. Disaster recovery, historic backups, regenerating data in non
> production environments.
>
>

Reply via email to