William, Some info on the sstables from me http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
If you want to know more check out the BigTable and original Facebook papers, linked from the wiki Aaron On 29 Apr 2011, at 23:43, William Oberman wrote: > Dumb question, but referenced twice now: which files are the SSTables and why > is backing them up incrementally a win? > > Or should I not bother to understand internals, and instead just roll with > the "backup my keyspace(s) and system in a compressed tar" strategy, as while > it may be excessive, it's guaranteed to work and work easily (which I like, a > great deal). > > will > > On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday <daniel.double...@gmx.net> > wrote: > What we are about to set up is a time machine like backup. This is more like > an add on to the s3 backup. > > Our boxes have an additional larger drive for local backup. We create a new > backup snaphot every x hours which hardlinks the files in the previous > snapshot (bit like cassandras incremental_backups thing) and than we sync > that snapshot dir with the cassandra data dir. We can do archiving / backup > to external system from there without impacting the main data raid. > > But the main reason to do this is to have an 'omg we screwed up big time and > deleted / corrupted data' recovery. > > On Apr 28, 2011, at 9:53 PM, William Oberman wrote: > >> Even with N-nodes for redundancy, I still want to have backups. I'm an >> amazon person, so naturally I'm thinking S3. Reading over the docs, and >> messing with nodeutil, it looks like each new snapshot contains the previous >> snapshot as a subset (and I've read how cassandra uses hard links to avoid >> excessive disk use). When does that pattern break down? >> >> I'm basically debating if I can do a "rsync" like backup, or if I should do >> a compressed tar backup. And I obviously want multiple points in time. S3 >> does allow file versioning, if a file or file name is changed/resused over >> time (only matters in the rsync case). My only concerns with compressed >> tars is I'll have to have free space to create the archive and I get no >> "delta" space savings on the backup (the former is solved by not allowing >> the disk space to get so low and/or adding more nodes to bring down the >> space, the latter is solved by S3 being really cheap anyways). >> >> -- >> Will Oberman >> Civic Science, Inc. >> 3030 Penn Avenue., First Floor >> Pittsburgh, PA 15201 >> (M) 412-480-7835 >> (E) ober...@civicscience.com > > > > > -- > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) ober...@civicscience.com