Hi Jonathan, Thanks for the suggestion! I see a couple of problems with this approach:
1. I do not know a priori all of the family names (so I still would not know what value to use for LIMIT). 2. The "versions" here are similar to timestamps, so one "family" may get updated far more often than the other. Hence, if I order all of my data by version, then the first 1000 rows in version order could all be from the same family---I want to just get the most recent value (or N-most recent values) for each unique family. I don't think there is a way to do this without performing some client-side filtering, but I thought I'd see if anyone has any ideas. I'm translating a framework that was originally designed on top of HBase, so offering this kind of functionality (by using HBases "timestamp dimension") was previously easy. :) Best regards, Clint On Tue, Feb 25, 2014 at 4:51 PM, Jonathan Lacefield <jlacefi...@datastax.com > wrote: > Clint > > One approach would be to create a copy of this table and switch the > clustering columns around so version precedes family. This way you > could easily grab the 1st, 2nd, N version rows. Would this help you > in your situation? > > Jonathan > > > On Feb 25, 2014, at 7:49 PM, Clint Kelly <clint.ke...@gmail.com> wrote: > > > > Hi everyone, > > > > Let's say that I have a table that looks like the following: > > > > CREATE TABLE time_series_stuff ( > > key text, > > family text, > > version int, > > val text, > > PRIMARY KEY (key, family, version) > > ) WITH CLUSTERING ORDER BY (family ASC, version DESC) AND > > bloom_filter_fp_chance=0.010000 AND > > caching='KEYS_ONLY' AND > > comment='' AND > > dclocal_read_repair_chance=0.000000 AND > > gc_grace_seconds=864000 AND > > index_interval=128 AND > > read_repair_chance=0.100000 AND > > replicate_on_write='true' AND > > populate_io_cache_on_flush='false' AND > > default_time_to_live=0 AND > > speculative_retry='99.0PERCENTILE' AND > > memtable_flush_period_in_ms=0 AND > > compaction={'class': 'SizeTieredCompactionStrategy'} AND > > compression={'sstable_compression': 'LZ4Compressor'}; > > > > cqlsh:fiddle> select * from time_series_stuff ; > > > > key | family | version | val > > --------+---------+---------+-------- > > monday | revenue | 3 | $$$$$$ > > monday | revenue | 2 | $$$ > > monday | revenue | 1 | $$ > > monday | revenue | 0 | $ > > monday | traffic | 2 | medium > > monday | traffic | 1 | light > > monday | traffic | 0 | heavy > > > > (7 rows) > > > > Now let's say that I'd like to perform a query that gets me the most > recent N versions of "revenue" and "traffic." > > > > Is there a CQL query to do this? Let's say that N=1. Then I know that > I can do: > > > > cqlsh:fiddle> select * from time_series_stuff where key='monday' and > family='revenue' limit 1; > > > > key | family | version | val > > --------+---------+---------+-------- > > monday | revenue | 3 | $$$$$$ > > > > (1 rows) > > > > cqlsh:fiddle> select * from time_series_stuff where key='monday' and > family='traffic' limit 1; > > > > key | family | version | val > > --------+---------+---------+-------- > > monday | traffic | 2 | medium > > > > (1 rows) > > > > But what if I have lots of "families" and I want to get the most recent > N versions of all of them in a single CQL statement. Is that possible? > Unfortunately I am working on something where the family names and the > number of most-recent versions are not known a priori (I am porting some > code that was designed for HBase). > > > > Best regards, > > Clint >