> We will follow your suggestion and we will run Node Repair tool more
> often in the future. However, what happens to data inserted/deleted
> after Node Repair tool runs (i.e., between Node Repair and Major
> Compaction).

It is handled as you would expect; deletions are propagated across the
cluster etc just like e.g. an overwrite would.

The thing that makes tombstones special is that deletions are
essentially a special case. While normal insertions, over-writes or
not, are fine because given some number of columns there is never an
issue deciding the latest one - the *lack* of a column is problematic
in a distributed system, and the active removal are represented by
these tombstones. If you were willing to store tombstones forever,
they would not be an issue. But typically that would not make sense,
since data that is removed will keep having a performance impact on
the cluster (and take up some disk space). Usually, when you remove
data you want it actually *removed*, so that there is no trace of it
at all. But as soon as you remove the tombstone, you lose track of the
fact that data was removed. So unless you *know* there is no data
somewhere in the cluster for a column, that is older than the
tombstone that indicates it removal, it's not safe to remove.

So, the grace period and the necessity to run nodetool repair is there
for that reason. The periodic nodetool repair is the method by which
you can "know" that there *is* in fact no data somewhere in the
cluster for a column, that is older than the tombstone that indicates
it removal. Hence, the expiry of the tombstones is safe.

-- 
/ Peter Schuller

Reply via email to