Re: How to maintain the N-most-recent versions of a value?

Paulo Ricardo Motta Gomes Fri, 18 Jul 2014 08:32:31 -0700

You might be interested in the following ticket:
https://issues.apache.org/jira/browse/CASSANDRA-3929


There's a patch available that was not integrated because it's not possible
to guarantee exactly N values will be kept, and there are some other
problems with deletions, but it may be useful depending on your usage
characteristics.


On Fri, Jul 18, 2014 at 7:58 AM, Laing, Michael <michael.la...@nytimes.com>
wrote:

> The cql you provided is invalid. You probably meant something like:
>
>  CREATE TABLE foo (
>>
>>     rowkey text,
>>
>>     family text,
>>
>>     qualifier text,
>>
>>     version int,
>>
>>     value blob,
>>
>>      PRIMARY KEY ((rowkey, family, qualifier), version))
>>
>> WITH CLUSTERING ORDER BY (version DESC);
>>
>>
>  We use ttl's and LIMIT for structures like these, paying attention to the
> construction of the partition key so that partition sizes are reasonable.
>
> If the blob might be large, store it somewhere else. We use S3 but you
> could also put it in another C* table.
>
> In 2.1 the row cache may help as it will store N rows per recently
> accessed partition, starting at the beginning of the partition.
>
> ml
>
>
> On Fri, Jul 18, 2014 at 6:30 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
>> If the versions can be guaranteed to be a adjacent (i.e. if the latest
>> version is V, the prior version is V-1) you could issue a delete at the
>> same time as an insert for V-N-(buffer) where buffer >= 0
>>
>> In general guaranteeing that is probably hard, so this seems like
>> something that would be nice to have C* manage for you. Unfortunately we
>> don't have anything on the roadmap to help with this. A custom compaction
>> strategy might do the trick, or permitting some filter during compaction
>> that can omit/tombstone certain records based on the input data. This
>> latter option probably wouldn't be too hard to implement, although it might
>> not offer any guarantees about expiring records in order without incurring
>> extra compaction cost (you could reasonably easily guarantee the most
>> recent N are present, but the cleaning up of older records might happen
>> haphazardly, in no particular order, and without any promptness guarantees,
>> if you want to do it cheaply). Feel free to file a ticket, or submit a
>> patch!
>>
>>
>> On Fri, Jul 18, 2014 at 1:32 AM, Clint Kelly <clint.ke...@gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I am trying to design a schema that will keep the N-most-recent
>>> versions of a value.  Currently my table looks like the following:
>>>
>>> CREATE TABLE foo (
>>>     rowkey text,
>>>     family text,
>>>     qualifier text,
>>>     version long,
>>>     value blob,
>>>     PRIMARY KEY (rowkey, family, qualifier, version))
>>> WITH CLUSTER ORDER BY (rowkey ASC, family ASC, qualifier ASC, version
>>> DESC));
>>>
>>> Is there any standard design pattern for updating such a layout such
>>> that I keep the N-most-recent (version, value) pairs for every unique
>>> (rowkey, family, qualifier)?  I can't think of any way to do this
>>> without doing a read-modify-write.  The best thing I can think of is
>>> to use TTL to approximate the desired behavior (which will work if I
>>> know how often we are writing new data to the table).  I could also
>>> use "LIMIT N" in my queries to limit myself to only N items, but that
>>> does not address any of the storage-size issues.
>>>
>>> In case anyone is curious, this question is related to some work that
>>> I am doing translating a system built on HBase (which provides this
>>> "keep the N-most-recent-version-of-a-cell" behavior) to Cassandra
>>> while providing the user with as-similar-as-possible an interface.
>>>
>>> Best regards,
>>> Clint
>>>
>>
>>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: How to maintain the N-most-recent versions of a value?

Reply via email to