A couple more questions.
When these rows are deleted, tombstones will be created and stored in
more recent sstables. Upon compaction of sstables, and after
gc_grace_period, I presume cassandra will have removed all traces of
that row from disk.
However, after deleting such a large amount of information, there is no
guarantee that Cassandra will compact these two tables together, causing
the data to be deleted (right?). Therefore, even after gc_grace_period,
a large amount of space may still be used.
Is there a way, other than a major compaction, to clean up all this old
data? I assume a nodetool scrub will cleanup old tombstones only if
that row is not in another sstable?
Do tombstones take up bloomfilter space after gc_grace_period?
-Mike
On 1/2/2013 6:41 PM, aaron morton wrote:
1) As one can imagine, the index and bloom filter for this column family is
large. Am I correct to assume that bloom filter and index space will not be
reduced until after gc_grace_period?
Yes.
2) If I would manually run repair across a cluster, is there a process I can
use to safely remove these tombstones before gc_grace period to free this
memory sooner?
There is nothing to specifically purge tombstones.
You can temporarily reduce the gc_grace_seconds and then trigger compaction.
Either by reducing the min_compaction_threshold to 2 and doing a flush. Or by
kicking of a user defined compaction using the JMX interface.
3) Any words of warning when undergoing this?
Make sure you have a good breakfast.
(It's more general advice than Cassandra specific.)
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 30/12/2012, at 8:51 AM, Mike <mthero...@yahoo.com> wrote:
Hello,
We are undergoing a change to our internal datamodel that will result in the
eventual deletion of over a hundred million rows from a Cassandra column
family. From what I understand, this will result in the generation of
tombstones, which will be cleaned up during compaction, after gc_grace_period
time (default: 10 days).
A couple of questions:
1) As one can imagine, the index and bloom filter for this column family is
large. Am I correct to assume that bloom filter and index space will not be
reduced until after gc_grace_period?
2) If I would manually run repair across a cluster, is there a process I can
use to safely remove these tombstones before gc_grace period to free this
memory sooner?
3) Any words of warning when undergoing this?
We are running Cassandra 1.1.2 on a 6 node cluster and a Replication Factor of
3. We use LOCAL_QUORM consistency for all operations.
Thanks!
-Mike