Re: scheduled work compaction strategy

Jeff Jirsa Fri, 16 Feb 2018 18:25:08 -0800

There’s a company using TWCS in this config - I’m not going to out them, but I 
think they do it (or used to) with aggressive tombstone sub properties. They 
may have since extended/enhanced it somewhat.


-- 
Jeff Jirsa


> On Feb 16, 2018, at 2:24 PM, Carl Mueller <carl.muel...@smartthings.com> 
> wrote:
> 
> Oh and as a further refinement outside of our use case.
> 
> If we could group/organize the sstables by the rowkey time value or
> inherent TTL value, the naive version would be evenly distributed buckets
> into the future.
> 
> But many/most data patterns like this have "busy" data in the near term.
> Far out scheduled stuff would be more sparse. In our case, 50% of the data
> is in the first 12 hours, 50% of the remaining in the next day or two, 50%
> of the remaining in the next week, etc etc.
> 
> So we could have a "long term" general bucket to take data far in the
> future. But here's the thing, if we could actively process the "long term"
> sstable on a regular basis into two sstables: the stuff that is still "long
> term" and sstables for the "near term", that could solve many general
> cases. The "long term" bucket could even be STCS by default, and as the
> near term comes into play, that is considered a different "level".
> 
> Of course all this relies on the ability to look at the data in the rowkey
> or the TTL associated with the row.
> 
> On Fri, Feb 16, 2018 at 4:17 PM, Carl Mueller <carl.muel...@smartthings.com>
> wrote:
> 
>> We have a scheduler app here at smartthings, where we track per-second
>> tasks to be executed.
>> 
>> These are all TTL'd to be destroyed after the second the event was
>> registered with has passed.
>> 
>> If the scheduling window was sufficiently small, say, 1 day, we could
>> probably use a time window compaction strategy with this. But the window is
>> one-two years worth of adhoc event registration per the contract.
>> 
>> Thus, the intermingling of all this data TTL'ing at the different times
>> since they are registered at different times means the sstables are not
>> written with data TTLing in the same rough time period. If they were, then
>> compaction would be a relatively easy process since the entire sstable
>> would tombstone.
>> 
>> We could kind of do this by doing sharded tables for the time periods and
>> rotating the shards for duty, and truncating them as they are recycled.
>> 
>> But an elegant way would be a custom compaction strategy that would
>> "window" the data into clustered sstables that could be compacted with
>> other similarly time bucketed sstables.
>> 
>> This would require visibility into the rowkey when it came time to convert
>> the memtable data to sstables. Is that even possible with compaction
>> schemes? We would provide a requirement that the time-based data would be
>> in the row key if it is a composite row key, making it required.
>> 
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: scheduled work compaction strategy

Reply via email to