Re: Backups, Snapshots, SSTable Data Files, Compaction

AJ Tue, 07 Jun 2011 00:21:18 -0700

On 6/6/2011 11:25 PM, Benjamin Coverston wrote:

Currently, my data dir has about 16 sets. I thought that compaction(with nodetool) would clean-up these files, but it doesn't. Neitherdoes cleanup or repair.
You're not even talking about snapshots using nodetool snapshot yet.Also nodetool compact does compact all of the live files, however thecompacted SSTables will not be cleaned up until a garbage collectionis triggered, or a capacity threshold is met.

Ok, so after a compaction, Cass is still not done with the older sets of.db files and I should let Cass delete them? But, I thought one of themain purposes of compaction was to reclaim disk storage resources. I'monly playing around with a small data set so I can't tell how fast thedata grows. I'm trying to plan my storage requirements. Is eachnewly-generated set as large in size as the previous?


The reason I ask is it seems a snapshot is...

Q1: Should the files with the lower index #'s (under thedata/{keyspace} directory) be manually deleted? Or, do ALL of thefiles in this directory need to be backed-up?
Do not ever delete files in your data directory if you care about dataon that replica, unless they are from a column family that no longerexists on that server. There may be some duplicate data in the files,but if the files are in the data directory, as a general rule, theyare there because they contain some set of data that is in none of theother SSTables.

... It seems a snapshot is implemented, unsurprisingly, as just a linkto the latest (highest indexed) set; not the previous sets. So,obviously, only the latest *.db files will get backed-up. Therefore,the previous sets must be worthless.

Re: Backups, Snapshots, SSTable Data Files, Compaction

Reply via email to