There’s a company using TWCS in this config - I’m not going to out them, but I think they do it (or used to) with aggressive tombstone sub properties. They may have since extended/enhanced it somewhat.
-- Jeff Jirsa > On Feb 16, 2018, at 2:24 PM, Carl Mueller <carl.muel...@smartthings.com> > wrote: > > Oh and as a further refinement outside of our use case. > > If we could group/organize the sstables by the rowkey time value or > inherent TTL value, the naive version would be evenly distributed buckets > into the future. > > But many/most data patterns like this have "busy" data in the near term. > Far out scheduled stuff would be more sparse. In our case, 50% of the data > is in the first 12 hours, 50% of the remaining in the next day or two, 50% > of the remaining in the next week, etc etc. > > So we could have a "long term" general bucket to take data far in the > future. But here's the thing, if we could actively process the "long term" > sstable on a regular basis into two sstables: the stuff that is still "long > term" and sstables for the "near term", that could solve many general > cases. The "long term" bucket could even be STCS by default, and as the > near term comes into play, that is considered a different "level". > > Of course all this relies on the ability to look at the data in the rowkey > or the TTL associated with the row. > > On Fri, Feb 16, 2018 at 4:17 PM, Carl Mueller <carl.muel...@smartthings.com> > wrote: > >> We have a scheduler app here at smartthings, where we track per-second >> tasks to be executed. >> >> These are all TTL'd to be destroyed after the second the event was >> registered with has passed. >> >> If the scheduling window was sufficiently small, say, 1 day, we could >> probably use a time window compaction strategy with this. But the window is >> one-two years worth of adhoc event registration per the contract. >> >> Thus, the intermingling of all this data TTL'ing at the different times >> since they are registered at different times means the sstables are not >> written with data TTLing in the same rough time period. If they were, then >> compaction would be a relatively easy process since the entire sstable >> would tombstone. >> >> We could kind of do this by doing sharded tables for the time periods and >> rotating the shards for duty, and truncating them as they are recycled. >> >> But an elegant way would be a custom compaction strategy that would >> "window" the data into clustered sstables that could be compacted with >> other similarly time bucketed sstables. >> >> This would require visibility into the rowkey when it came time to convert >> the memtable data to sstables. Is that even possible with compaction >> schemes? We would provide a requirement that the time-based data would be >> in the row key if it is a composite row key, making it required. >> >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org