> We will follow your suggestion and we will run Node Repair tool more > often in the future. However, what happens to data inserted/deleted > after Node Repair tool runs (i.e., between Node Repair and Major > Compaction).
It is handled as you would expect; deletions are propagated across the cluster etc just like e.g. an overwrite would. The thing that makes tombstones special is that deletions are essentially a special case. While normal insertions, over-writes or not, are fine because given some number of columns there is never an issue deciding the latest one - the *lack* of a column is problematic in a distributed system, and the active removal are represented by these tombstones. If you were willing to store tombstones forever, they would not be an issue. But typically that would not make sense, since data that is removed will keep having a performance impact on the cluster (and take up some disk space). Usually, when you remove data you want it actually *removed*, so that there is no trace of it at all. But as soon as you remove the tombstone, you lose track of the fact that data was removed. So unless you *know* there is no data somewhere in the cluster for a column, that is older than the tombstone that indicates it removal, it's not safe to remove. So, the grace period and the necessity to run nodetool repair is there for that reason. The periodic nodetool repair is the method by which you can "know" that there *is* in fact no data somewhere in the cluster for a column, that is older than the tombstone that indicates it removal. Hence, the expiry of the tombstones is safe. -- / Peter Schuller