On 6/6/2011 11:25 PM, Benjamin Coverston wrote:
Currently, my data dir has about 16 sets. I thought that compaction (with nodetool) would clean-up these files, but it doesn't. Neither does cleanup or repair.

You're not even talking about snapshots using nodetool snapshot yet. Also nodetool compact does compact all of the live files, however the compacted SSTables will not be cleaned up until a garbage collection is triggered, or a capacity threshold is met.

Ok, so after a compaction, Cass is still not done with the older sets of .db files and I should let Cass delete them? But, I thought one of the main purposes of compaction was to reclaim disk storage resources. I'm only playing around with a small data set so I can't tell how fast the data grows. I'm trying to plan my storage requirements. Is each newly-generated set as large in size as the previous?

The reason I ask is it seems a snapshot is...

Q1: Should the files with the lower index #'s (under the data/{keyspace} directory) be manually deleted? Or, do ALL of the files in this directory need to be backed-up?
Do not ever delete files in your data directory if you care about data on that replica, unless they are from a column family that no longer exists on that server. There may be some duplicate data in the files, but if the files are in the data directory, as a general rule, they are there because they contain some set of data that is in none of the other SSTables.

... It seems a snapshot is implemented, unsurprisingly, as just a link to the latest (highest indexed) set; not the previous sets. So, obviously, only the latest *.db files will get backed-up. Therefore, the previous sets must be worthless.

Reply via email to