In the system we're using, we have a large fleet of servers constantly
appending time-based data to our database--it's largely writes, very few
reads (it's auditing data).  However, our cluster max space is around 80TB,
and we'd like to maximize how much data we can retain.

One option is to delete all old records, or to set a TTL, but that requires
a substantial clean-up process that we could easily avoid if we were able
to just flat-out drop the oldest sstables.  I.e., when we get to 90% disk
space, drop the oldest sstable.  Obviously, the oldest sstable on one may
not be the same as the oldest sstable on another, but since this is the
oldest data, that is an acceptable inconsistency.

Is this possible to do safely?  The data in the oldest sstable is always
guaranteed to be the oldest data, so that is not my concern--my main
concern is whether or not we can even do this, and also how we can notify
Cassandra that an sstable has been removed underneath it.

tl;dr: Can I routinely remove the oldest sstable to free up disk space,
without causing stability drops in Cassandra?

Thanks for your feedback!

Andrew

Reply via email to