We're finding that reading deleted columns can be very slow and I'm
trying to get confirmation for our analysis of what happens. We wrote
lots of data eons ago into fairly large rows (up to 1MB). We recently
read those rows and then deleted them. After this, we ran a
verification-type pass that attempts to re-read these rows and verifies
that they are indeed deleted. The interval between the deletion and
verification pass was far less than gc_grace. We noticed that the
verification pass took as much time as the read&delete pass(!), while
verifying the non-existence of rows that never existed is blindingly
fast in comparison. So it seems that cassandra is reading the old data,
reading the new tombstones, and then returning "there is no data".
Functionally correct, but rather unexpected performance
characteristics... Am I missing something or is this expected?
Thanks!
Thorsten