On Fri, Mar 1, 2013 at 5:16 PM, Adam Venturella <aventure...@gmail.com>wrote:
> My ColumnFamily is defined as follows: > > > CREATE TABLE UserProfileHistory( > username text, > timestamp bigint, -- millis since epoch > data text, -- JSON > PRIMARY KEY (username, timestamp) > ) WITH CLUSTERING ORDER BY (timestamp DESC); > > > Each insert on the username adds to the wide row. The most recent profile > history being able to be retrieved by > > SELECT * FROM UserProfileHistory WHERE username=:username LIMIT 1; > > For some reporting needs I need to fetch the entire history, and I need to > do it in ASC order instead of DESC. > > One option is to do the sorting in code, collect N results, sort on the > timestamps accordingly. Given the row is of N length, that could start to > put an undo memory burden in my application layer, and I would like to > avoid that if possible opting instead for Cassandra to perform the work. > > > So I am leaning towards this option: > > 2) min timestamp seek + ORDER BY > > To start the process my initial timestamp would be > 01-01-1970T12:00:00+0000 (assume that is in milliseconds, aka 0) I would > then issue my query: > > SELECT * FROM UserProfileHistory WHERE username=:username AND timestamp > > :milliseconds ORDER BY timestamp ASC LIMIT 100 > > Once I have those initial results I would just pick my last timestamp from > the result set and + 1 on it and run the query again until I received 0 > results. > > > The CQL works and returns my results as I expect. This will probably only > be run once every 24 hours, maybe every 12 hours; point being, not often. > > Am I setting myself up for a disaster down the line? > No. Paging over a partition key like you do in reverse order of the clustering order by is slightly slower than doing it in the clustering order, but not by a whole lot. It's slightly slower because 1) there will be backward seek underneath between the on-disk index block (not a huge deal) and 2) there is some reversing of lists going on before returning each query (again, not a huge deal). You'll be totally fine, especially if that query is not the one on which latency is the most critical. -- Sylvain