Re: Time series data with only inserts

Jeff Jirsa Mon, 30 May 2016 21:27:13 -0700

Your compaction strategy gets triggered whenever you flush memtables to disk.

Most compaction strategies, especially those designed for write-only 
time-series workloads, check for fully expired sstables 
(getFullyExpiredSStables()) “often” (DTCS does it every 10 minutes, because 
it’s fairly expensive). That’s THE most efficient way to drop expired data - 
full sstable drops because it’s fully expired. Given that you’re not doing 
reads, it’s likely that getFullyExpiredSStables will have few (or no) blockers, 
and will search for / return fully expired sstables 7 days after they’re 
created, assuming you manage to use a compaction strategy that doesn’t mix old 
data with new data (DTCS is the only ‘official’ one that does this now, though 
TWCS in #9666 may be interesting to you).

Unfortunately, life being what it is, it’s pretty easy to end up in a situation 
where read repairs or other overlaps cause ‘blockers’ which prevent sstables 
from being fully expired. In those situations, using the tombstone compaction 
sub properties can nudge things in the right direction (for example, you can 
tell cassandra to compact a sstable with itself if it’s over 24 hours old and 
contains more than 80% tombstones, where 24 and 80 are both variables you 
control). Check out 
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html for 
the tombstone related options.

- Jeff

On 5/30/16, 3:54 PM, "Rakesh Kumar" <rakeshkumar46...@gmail.com> wrote:

>Let us assume that there is a table which gets only inserts and under
>normal circumstances no reads on it. If we assume TTL to be 7 days,
>what event
>will trigger a compaction/purge of old data if the old data is not in
>the mem cache and no session needs it.
>
>thanks.

smime.p7s
Description: S/MIME cryptographic signature

Re: Time series data with only inserts

Reply via email to