This is what I thought. I was wishing there might be another way to reclaim
the space.
The problem is that the more data you have the more time it will take to
Cassandra to response.
Reclaim space of deleted rows in the biggest SSTable requires Major
compaction. This compaction can be triggered by adding x2 data (or x4 data
in the default configuration) to the system or by executing it manually
using JMX.
In case of a system that deletes data regularly, which needs to serve
customers all day and the time it takes should be in ms, this is a problem.

It appears to me that in order to use Cassandra you must have a process that
will trigger major compaction on the nodes once in X amount of time.
One case where you would do that is when you don't (or hardly) delete data.
Another one is when your upper limit of time it should take to response is
very high so major compaction will not hurt you.

It might be that the only way to solve this problem is by having at least
two copies of each row in each data center and use a dynamic snitch.

Shimi

On Mon, Jan 3, 2011 at 7:55 PM, Peter Schuller
<peter.schul...@infidyne.com>wrote:

> > Major compaction does it, but only if GCGraceSeconds has elapsed. See:
> >
> >
> http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html
>
> But to be clear, under the assumption that your data is a lot smaller
> than the tombstones, a major compaction will definitely reclaim space
> even if GCGraceSeconds has not elapsed. So actually my original
> response is a bit misleading.
>
> --
> / Peter Schuller
>

Reply via email to