On Mar 20, 2010, at 9:10 AM, Jeremy Dunck wrote: > On Sat, Mar 20, 2010 at 10:40 AM, Chris Goffinet <goffi...@digg.com> wrote: >>> 5. Backups : If there is a 4 or 5 TB cassandra cluster what do you >>> recommend the backup scenario's could be? >> >> Worst case scenario (total failure) we opted to do global snapshots every 24 >> hours. This creates hard links to SSTables on each node. We copy those >> SSTables to HDFS on daily basis. We also wrote a patch to log all events >> going into the commit log to be written to Scribe so we can have a rolling >> commit log into HDFS. So in the event that entire cluster corrupts, we can >> take the last 24 hours snapshot + the commit log right after last snapshot >> and get the cluster into the last known good state. > > Doesn't this leave you open to corruption you don't discover within 24 hours?
No. We aren't storing the actual commit log structure, we have our own. -Chris