I'm curious why you are storing the backups (sstables and commit logs) to HDFS instead of something like lustre. Are your backups using Hadoop's map/reduce somehow? Or is it for convenience?
On Sat, Mar 20, 2010 at 8:40 AM, Chris Goffinet <goffi...@digg.com> wrote: > > 5. Backups : If there is a 4 or 5 TB cassandra cluster what do you > recommend the backup scenario's could be? > > Worst case scenario (total failure) we opted to do global snapshots every > 24 hours. This creates hard links to SSTables on each node. We copy those > SSTables to HDFS on daily basis. We also wrote a patch to log all events > going into the commit log to be written to Scribe so we can have a rolling > commit log into HDFS. So in the event that entire cluster corrupts, we can > take the last 24 hours snapshot + the commit log right after last snapshot > and get the cluster into the last known good state. > > -Chris -- Virtually, Ned Wolpert "Settle thy studies, Faustus, and begin..." --Marlowe