Re: How to maintain the N-most-recent versions of a value?

Laing, Michael Fri, 18 Jul 2014 03:59:28 -0700

The cql you provided is invalid. You probably meant something like:

CREATE TABLE foo (
>
>     rowkey text,
>
>     family text,
>
>     qualifier text,
>
>     version int,
>
>     value blob,
>
>     PRIMARY KEY ((rowkey, family, qualifier), version))
>
> WITH CLUSTERING ORDER BY (version DESC);
>
>
 We use ttl's and LIMIT for structures like these, paying attention to the
construction of the partition key so that partition sizes are reasonable.


If the blob might be large, store it somewhere else. We use S3 but you
could also put it in another C* table.

In 2.1 the row cache may help as it will store N rows per recently accessed
partition, starting at the beginning of the partition.

ml


On Fri, Jul 18, 2014 at 6:30 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> If the versions can be guaranteed to be a adjacent (i.e. if the latest
> version is V, the prior version is V-1) you could issue a delete at the
> same time as an insert for V-N-(buffer) where buffer >= 0
>
> In general guaranteeing that is probably hard, so this seems like
> something that would be nice to have C* manage for you. Unfortunately we
> don't have anything on the roadmap to help with this. A custom compaction
> strategy might do the trick, or permitting some filter during compaction
> that can omit/tombstone certain records based on the input data. This
> latter option probably wouldn't be too hard to implement, although it might
> not offer any guarantees about expiring records in order without incurring
> extra compaction cost (you could reasonably easily guarantee the most
> recent N are present, but the cleaning up of older records might happen
> haphazardly, in no particular order, and without any promptness guarantees,
> if you want to do it cheaply). Feel free to file a ticket, or submit a
> patch!
>
>
> On Fri, Jul 18, 2014 at 1:32 AM, Clint Kelly <clint.ke...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I am trying to design a schema that will keep the N-most-recent
>> versions of a value.  Currently my table looks like the following:
>>
>> CREATE TABLE foo (
>>     rowkey text,
>>     family text,
>>     qualifier text,
>>     version long,
>>     value blob,
>>     PRIMARY KEY (rowkey, family, qualifier, version))
>> WITH CLUSTER ORDER BY (rowkey ASC, family ASC, qualifier ASC, version
>> DESC));
>>
>> Is there any standard design pattern for updating such a layout such
>> that I keep the N-most-recent (version, value) pairs for every unique
>> (rowkey, family, qualifier)?  I can't think of any way to do this
>> without doing a read-modify-write.  The best thing I can think of is
>> to use TTL to approximate the desired behavior (which will work if I
>> know how often we are writing new data to the table).  I could also
>> use "LIMIT N" in my queries to limit myself to only N items, but that
>> does not address any of the storage-size issues.
>>
>> In case anyone is curious, this question is related to some work that
>> I am doing translating a system built on HBase (which provides this
>> "keep the N-most-recent-version-of-a-cell" behavior) to Cassandra
>> while providing the user with as-similar-as-possible an interface.
>>
>> Best regards,
>> Clint
>>
>
>

Re: How to maintain the N-most-recent versions of a value?

Reply via email to