Re: repair, compaction, and tombstone rows

horschi Fri, 02 Nov 2012 02:47:00 -0700

Hi Sylvain,

might I ask why repair cannot simply ignore anything that is older than
gc-grace? (like Aaron proposed)  I agree that repair should not process any
tombstones or anything. But in my mind it sounds reasonable to make repair
ignore timed-out data. Because the timestamp is created on the client,
there is no reason to repair these, right?


We are using TTLs quite heavily and I was noticing that every repair
increases the load of all nodes by 1-2 GBs, where each node has about
20-30GB of data. I dont know if this increases with the data-volume. The
data is mostly time-series data.
I even noticed an increase when running two repairs directly after each
other. So even when data was just repaired, there is still data being
transferred. I assume this is due some columns timing out within that
timeframe and the entire row being repaired.

regards,
Christian

On Thu, Nov 1, 2012 at 9:43 AM, Sylvain Lebresne <sylv...@datastax.com>wrote:

> > Is this a feature or a bug?
>
> Neither really. Repair doesn't do any gcable tombstone collection and
> it would be really hard to change that (besides, it's not his job). So
> if you when you run repair there is sstable with tombstone that could
> be collected but are not yet, then yes, they will be streamed. Now the
> theory is that compaction will run often enough that gcable tombstone
> will be collected in a reasonably timely fashion and so you will never
> have lots of such tombstones in general (making the fact that repair
> stream them largely irrelevant). That being said, in practice, I don't
> doubt that there is a few scenario like your own where this still can
> lead to doing too much useless work.
>
> I believe the main problem is that size tiered compaction has a
> tendency to not compact the largest sstables very often. Meaning that
> you could have large sstable with mostly gcable tombstone sitting
> around. In the upcoming Cassandra 1.2,
> https://issues.apache.org/jira/browse/CASSANDRA-3442 will fix that.
> Until then, if you are no afraid of a little bit of scripting, one
> option could be before running a repair to run a small script that
> would check the creation time of your sstable. If an sstable is old
> enough (for some value of that that depends on what is the TTL you use
> on all your columns), you may want to force a compaction (using the
> JMX call forceUserDefinedCompaction()) of that sstable. The goal being
> to get read of a maximum of outdated tombstones before running the
> repair (you could also alternatively run a major compaction prior to
> the repair, but major compactions have a lot of nasty effect so I
> wouldn't recommend that a priori).
>
> --
> Sylvain
>

Re: repair, compaction, and tombstone rows

Reply via email to