Good points.

I would backup all partitions to HDFS (etc.), as fast as the data arrives. In 
case the Kafka becomes corrupted, the topics can be repopulated from the 
backup. In my case, all clients track their own offsets, so they should in 
theory be able to continue as if nothing had happened.

Regenerating (duplicating?) data for non-production environment could then be 
done from either production Kafka cluster or from the backup.

> On 14 Mar 2016, at 12:32, Ben Stopford <b...@confluent.io> wrote:
> 
> - Compacted topics provide a useful way to retain meaningful datasets inside 
> the broker, which don’t grow indefinitely. If you have an update-in-place use 
> case, where the event sourced approach doesn’t buy you much, these will keep 
> the reload time down when you regenerate materialised views.  
> - When going down the master data store route a few different problems may 
> conflate. Disaster recovery, historic backups, regenerating data in non 
> production environments.  

Reply via email to