Thanks, I think I'm getting some of the file layout/data structures now, so that helps with the backup strategy. I might still start simple, as it's usually harder to screw up simple, but at least I'll know where I can go with something more clever.
will On Sat, Apr 30, 2011 at 9:15 AM, Jeremiah Jordan < jeremiah.jor...@morningstar.com> wrote: > The files inside the keyspace folders are the SSTable. > > ------------------------------ > *From:* aaron morton [mailto:aa...@thelastpickle.com] > *Sent:* Friday, April 29, 2011 4:49 PM > *To:* user@cassandra.apache.org > *Subject:* Re: best way to backup > > William, > Some info on the sstables from me > http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ > > <http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/>If you > want to know more check out the BigTable and original Facebook papers, > linked from the wiki > > <http://wiki.apache.org/cassandra/ArchitectureOverview>Aaron > > On 29 Apr 2011, at 23:43, William Oberman wrote: > > Dumb question, but referenced twice now: which files are the SSTables and > why is backing them up incrementally a win? > > Or should I not bother to understand internals, and instead just roll with > the "backup my keyspace(s) and system in a compressed tar" strategy, as > while it may be excessive, it's guaranteed to work and work easily (which I > like, a great deal). > > will > > On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday < > daniel.double...@gmx.net> wrote: > >> What we are about to set up is a time machine like backup. This is more >> like an add on to the s3 backup. >> >> Our boxes have an additional larger drive for local backup. We create a >> new backup snaphot every x hours which hardlinks the files in the previous >> snapshot (bit like cassandras incremental_backups thing) and than we sync >> that snapshot dir with the cassandra data dir. We can do archiving / backup >> to external system from there without impacting the main data raid. >> >> But the main reason to do this is to have an 'omg we screwed up big time >> and deleted / corrupted data' recovery. >> >> On Apr 28, 2011, at 9:53 PM, William Oberman wrote: >> >> Even with N-nodes for redundancy, I still want to have backups. I'm an >> amazon person, so naturally I'm thinking S3. Reading over the docs, and >> messing with nodeutil, it looks like each new snapshot contains the previous >> snapshot as a subset (and I've read how cassandra uses hard links to avoid >> excessive disk use). When does that pattern break down? >> >> I'm basically debating if I can do a "rsync" like backup, or if I should >> do a compressed tar backup. And I obviously want multiple points in time. >> S3 does allow file versioning, if a file or file name is changed/resused >> over time (only matters in the rsync case). My only concerns with >> compressed tars is I'll have to have free space to create the archive and I >> get no "delta" space savings on the backup (the former is solved by not >> allowing the disk space to get so low and/or adding more nodes to bring down >> the space, the latter is solved by S3 being really cheap anyways). >> >> -- >> Will Oberman >> Civic Science, Inc. >> 3030 Penn Avenue., First Floor >> Pittsburgh, PA 15201 >> (M) 412-480-7835 >> (E) ober...@civicscience.com >> >> >> > > > -- > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) ober...@civicscience.com > > > -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com