Hi everyone,

I am trying to design a schema that will keep the N-most-recent
versions of a value.  Currently my table looks like the following:

CREATE TABLE foo (
    rowkey text,
    family text,
    qualifier text,
    version long,
    value blob,
    PRIMARY KEY (rowkey, family, qualifier, version))
WITH CLUSTER ORDER BY (rowkey ASC, family ASC, qualifier ASC, version DESC));

Is there any standard design pattern for updating such a layout such
that I keep the N-most-recent (version, value) pairs for every unique
(rowkey, family, qualifier)?  I can't think of any way to do this
without doing a read-modify-write.  The best thing I can think of is
to use TTL to approximate the desired behavior (which will work if I
know how often we are writing new data to the table).  I could also
use "LIMIT N" in my queries to limit myself to only N items, but that
does not address any of the storage-size issues.

In case anyone is curious, this question is related to some work that
I am doing translating a system built on HBase (which provides this
"keep the N-most-recent-version-of-a-cell" behavior) to Cassandra
while providing the user with as-similar-as-possible an interface.

Best regards,
Clint

Reply via email to