William, 
        Some info on the sstables from me 
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

        If you want to know more check out the BigTable and original Facebook 
papers, linked from the wiki

Aaron

On 29 Apr 2011, at 23:43, William Oberman wrote:

> Dumb question, but referenced twice now: which files are the SSTables and why 
> is backing them up incrementally a win?
> 
> Or should I not bother to understand internals, and instead just roll with 
> the "backup my keyspace(s) and system in a compressed tar" strategy, as while 
> it may be excessive, it's guaranteed to work and work easily (which I like, a 
> great deal).
> 
> will
> 
> On Fri, Apr 29, 2011 at 4:58 AM, Daniel Doubleday <daniel.double...@gmx.net> 
> wrote:
> What we are about to set up is a time machine like backup. This is more like 
> an add on to the s3 backup.
> 
> Our boxes have an additional larger drive for local backup. We create a new 
> backup snaphot every x hours which hardlinks the files in the previous 
> snapshot (bit like cassandras incremental_backups thing) and than we sync 
> that snapshot dir with the cassandra data dir. We can do archiving / backup 
> to external system from there without impacting the main data raid.
> 
> But the main reason to do this is to have an 'omg we screwed up big time and 
> deleted / corrupted data' recovery.
> 
> On Apr 28, 2011, at 9:53 PM, William Oberman wrote:
> 
>> Even with N-nodes for redundancy, I still want to have backups.  I'm an 
>> amazon person, so naturally I'm thinking S3.  Reading over the docs, and 
>> messing with nodeutil, it looks like each new snapshot contains the previous 
>> snapshot as a subset (and I've read how cassandra uses hard links to avoid 
>> excessive disk use).  When does that pattern break down?  
>> 
>> I'm basically debating if I can do a "rsync" like backup, or if I should do 
>> a compressed tar backup.  And I obviously want multiple points in time.  S3 
>> does allow file versioning, if a file or file name is changed/resused over 
>> time (only matters in the rsync case).  My only concerns with compressed 
>> tars is I'll have to have free space to create the archive and I get no 
>> "delta" space savings on the backup (the former is solved by not allowing 
>> the disk space to get so low and/or adding more nodes to bring down the 
>> space, the latter is solved by S3 being really cheap anyways).
>> 
>> -- 
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) ober...@civicscience.com
> 
> 
> 
> 
> -- 
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) ober...@civicscience.com

Reply via email to