> Is this a feature or a bug? Neither really. Repair doesn't do any gcable tombstone collection and it would be really hard to change that (besides, it's not his job). So if you when you run repair there is sstable with tombstone that could be collected but are not yet, then yes, they will be streamed. Now the theory is that compaction will run often enough that gcable tombstone will be collected in a reasonably timely fashion and so you will never have lots of such tombstones in general (making the fact that repair stream them largely irrelevant). That being said, in practice, I don't doubt that there is a few scenario like your own where this still can lead to doing too much useless work.
I believe the main problem is that size tiered compaction has a tendency to not compact the largest sstables very often. Meaning that you could have large sstable with mostly gcable tombstone sitting around. In the upcoming Cassandra 1.2, https://issues.apache.org/jira/browse/CASSANDRA-3442 will fix that. Until then, if you are no afraid of a little bit of scripting, one option could be before running a repair to run a small script that would check the creation time of your sstable. If an sstable is old enough (for some value of that that depends on what is the TTL you use on all your columns), you may want to force a compaction (using the JMX call forceUserDefinedCompaction()) of that sstable. The goal being to get read of a maximum of outdated tombstones before running the repair (you could also alternatively run a major compaction prior to the repair, but major compactions have a lot of nasty effect so I wouldn't recommend that a priori). -- Sylvain