RE: questions related to the SSTable file

java8964 java8964 Tue, 17 Sep 2013 18:52:22 -0700

Quote: 
"
To be clear, "incremental backup" feature backs up the data being modified in 
that period, because it writes only those files to the incremental backup dir 
as hard links, between full snapshots."
I thought I was clearer, but your clarification confused me again.My 
understanding so far from all the answer I got so far, I believe, the more 
accurate statement of "incremental backup" should be "incremental backup" 
feature backs up the SSTable files being generated in that period. 
But there is no way we can be sure that these SSTable files will ONLY contain 
modified data. So the statement being quoted above is not exactly right. I 
agree that all the modified data in that period will be in the incremental 
sstable files, but a lot of other unmodified data will be in them too.
If we have 2 rows data with different row key in the same memtable, and if only 
2nd row being modified. When the memtable is flushed to SSTable file, it will 
contain both rows, and both will be in the incremental backup files. So for 
first row, nothing change, but it will be in the incremental backup.
If I have one row with one column, now a new column is added, and whole row in 
one memtable being flushed to SSTable file, as also in this incremental backup. 
For first column, nothing change, but it will still be in incremental backup 
file.
The point I tried to make is this is important if I design an ETL to consume 
the incremental backup SSTable files. As above example, I have to realize that 
in the incremental backup sstable files, they could or most likely contain old 
data which was previous being processed already. That will require additional 
logic and responsibility in the ETL to handle it, or any outsider SSTable 
consumer to pay attention to it.
Yong
Date: Tue, 17 Sep 2013 18:01:45 -0700
Subject: Re: questions related to the SSTable file
From: rc...@eventbrite.com
To: user@cassandra.apache.org

On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato <ts...@cloudian.com> wrote:

> So in fact, incremental backup of Cassandra is just hard link all the new 
> SSTable files being generated during the incremental backup period. It could 
> contain any data, not just the data being update/insert/delete in this 
> period, correct?

Correct.
But over time, some old enough SSTable files are usually shared across multiple 
snapshots. 

To be clear, "incremental backup" feature backs up the data being modified in 
that period, because it writes only those files to the incremental backup dir 
as hard links, between full snapshots.

http://www.datastax.com/docs/1.0/operations/backup_restore
"When incremental backups are enabled (disabled by default), Cassandra 
hard-links each flushed SSTable to a backups directory under the keyspace data 
directory. This allows you to store backups offsite without transferring entire 
snapshots. Also, incremental backups combine with snapshots to provide a 
dependable, up-to-date backup mechanism.
"

What Takenori is referring to is that a full snapshot is in some ways an 
"incremental backup" because it shares hard linked SSTables with other 
snapshots.

=Rob

RE: questions related to the SSTable file

Reply via email to