Re: Digg's data model

Chris Goffinet Sat, 20 Mar 2010 08:41:14 -0700

> 5. Backups : If there is a  4 or 5 TB cassandra cluster what do you recommend 
> the backup scenario's could be?


Worst case scenario (total failure) we opted to do global snapshots every 24 
hours. This creates hard links to SSTables on each node. We copy those SSTables 
to HDFS on daily basis. We also wrote a patch to log all events going into the 
commit log to be written to Scribe so we can have a rolling commit log into 
HDFS. So in the event that entire cluster corrupts, we can take the last 24 
hours snapshot + the commit log right after last snapshot and get the cluster 
into the last known good state.

-Chris

Re: Digg's data model

Reply via email to