On Fri, Mar 1, 2013 at 5:16 PM, Adam Venturella <aventure...@gmail.com>wrote:

> My ColumnFamily is defined as follows:
>
>
> CREATE TABLE UserProfileHistory(
>     username text,
>     timestamp bigint, -- millis since epoch
>     data text, -- JSON
>     PRIMARY KEY (username, timestamp)
> ) WITH CLUSTERING ORDER BY (timestamp DESC);
>
>
> Each insert on the username adds to the wide row. The most recent profile
> history being able to be retrieved by
>
> SELECT * FROM UserProfileHistory WHERE username=:username LIMIT 1;
>
> For some reporting needs I need to fetch the entire history, and I need to
> do it in ASC order instead of DESC.
>
> One option is to do the sorting in code, collect N results, sort on the
> timestamps accordingly. Given the row is of N length, that could start to
> put an undo memory burden in my application layer, and I would like to
> avoid that if possible opting instead for Cassandra to perform the work.
>
>
> So I am leaning towards this option:
>
> 2) min timestamp seek + ORDER BY
>
> To start the process my initial timestamp would be
> 01-01-1970T12:00:00+0000 (assume that is in milliseconds, aka 0) I would
> then issue my query:
>
> SELECT * FROM UserProfileHistory WHERE username=:username AND timestamp >
> :milliseconds ORDER BY timestamp ASC LIMIT 100
>
> Once I have those initial results I would just pick my last timestamp from
> the result set and + 1 on it  and run the query again until I received 0
> results.
>
>
> The CQL works and returns my results as I expect. This will probably only
> be run once every 24 hours, maybe every 12 hours; point being, not often.
>
> Am I setting myself up for a disaster down the line?
>

No.

Paging over a partition key like you do in reverse order of the clustering
order by is slightly slower than doing it in the clustering order, but not
by a whole lot. It's slightly slower because 1) there will be backward seek
underneath between the on-disk index block (not a huge deal) and 2) there
is some reversing of lists going on before returning each query (again, not
a huge deal). You'll be totally fine, especially if that query is not the
one on which latency is the most critical.

--
Sylvain

Reply via email to