A couple more questions.

When these rows are deleted, tombstones will be created and stored in more recent sstables. Upon compaction of sstables, and after gc_grace_period, I presume cassandra will have removed all traces of that row from disk.

However, after deleting such a large amount of information, there is no guarantee that Cassandra will compact these two tables together, causing the data to be deleted (right?). Therefore, even after gc_grace_period, a large amount of space may still be used.

Is there a way, other than a major compaction, to clean up all this old data? I assume a nodetool scrub will cleanup old tombstones only if that row is not in another sstable?

Do tombstones take up bloomfilter space after gc_grace_period?

-Mike

On 1/2/2013 6:41 PM, aaron morton wrote:
1) As one can imagine, the index and bloom filter for this column family is 
large.  Am I correct to assume that bloom filter and index space will not be 
reduced until after gc_grace_period?
Yes.

2) If I would manually run repair across a cluster, is there a process I can 
use to safely remove these tombstones before gc_grace period to free this 
memory sooner?
There is nothing to specifically purge tombstones.

You can temporarily reduce the gc_grace_seconds and then trigger compaction. 
Either by reducing the min_compaction_threshold to 2 and doing a flush. Or by 
kicking of a user defined compaction using the JMX interface.

3) Any words of warning when undergoing this?
Make sure you have a good breakfast.
(It's more general advice than Cassandra specific.)


Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 30/12/2012, at 8:51 AM, Mike <mthero...@yahoo.com> wrote:

Hello,

We are undergoing a change to our internal datamodel that will result in the 
eventual deletion of over a hundred million rows from a Cassandra column 
family.  From what I understand, this will result in the generation of 
tombstones, which will be cleaned up during compaction, after gc_grace_period 
time (default: 10 days).

A couple of questions:

1) As one can imagine, the index and bloom filter for this column family is 
large.  Am I correct to assume that bloom filter and index space will not be 
reduced until after gc_grace_period?

2) If I would manually run repair across a cluster, is there a process I can 
use to safely remove these tombstones before gc_grace period to free this 
memory sooner?

3) Any words of warning when undergoing this?

We are running Cassandra 1.1.2 on a 6 node cluster and a Replication Factor of 
3.  We use LOCAL_QUORM consistency for all operations.

Thanks!
-Mike

Reply via email to