Situation: We use TWCS for a task history table (partition is user, column key is timeuuid of task, TWCS is used due to tombstone TTLs that rotate out the tasks every say month. )
However, if we want to get a "slice" of tasks (say, tasks in the last two days and we are using TWCS sstable blocks of 12 hours). The problem is, this is a frequent user and they have tasks in ALL the sstables that are organized by the TWCS into time-bucketed sstables. So Cassandra has to first read in, say 80 sstables to reconstruct the row, THEN it can exclude/slice on the column key. Question: Or am I wrong that the read path needs to grab all relevant sstables before applying column key slicing and this is possible? Admittedly we are in 2.1 for this table (we in the process of upgrading now that we have an automated upgrading program that seems to work pretty well) If my assumption is correct, then the compaction strategy knows as it writes the sstables what it is bucketing them as (and could encode in sstable metadata?). If my assumption about slicing is that the whole row needs reconstruction, if we had a perfect infinite monkey coding team that could generate whatever we wanted within some feasibility, could we provide special hooks to do sstable exclusion based on metadata if we know that that the metadata will indicate exclusion/inclusion of columns based on metadata? Goal: The overall goal would be to support exclusion of sstables from a read path, in case we had compaction strategies hand-tailored for other queries. Essentially we would be doing a first-pass bucketsort exclusion with the sstable metadata marking the buckets. This might aid support of superwide rows and paging through column keys if we allowed the table creator to specify bucketing as flushing occurs. In general it appears query performance quickly degrades based on # sstables required for a lookup. I still don't know the code nearly well enough to do patches, it would seem based on my looking at custom compaction strategies and the basic read path that this would be a useful extension for advanced users. The fallback would be a set of tables to serve as buckets and we span the buckets with queries when one bucket runs out. The tables rotate.