You might be interested in the following ticket: https://issues.apache.org/jira/browse/CASSANDRA-3929
There's a patch available that was not integrated because it's not possible to guarantee exactly N values will be kept, and there are some other problems with deletions, but it may be useful depending on your usage characteristics. On Fri, Jul 18, 2014 at 7:58 AM, Laing, Michael <michael.la...@nytimes.com> wrote: > The cql you provided is invalid. You probably meant something like: > > CREATE TABLE foo ( >> >> rowkey text, >> >> family text, >> >> qualifier text, >> >> version int, >> >> value blob, >> >> PRIMARY KEY ((rowkey, family, qualifier), version)) >> >> WITH CLUSTERING ORDER BY (version DESC); >> >> > We use ttl's and LIMIT for structures like these, paying attention to the > construction of the partition key so that partition sizes are reasonable. > > If the blob might be large, store it somewhere else. We use S3 but you > could also put it in another C* table. > > In 2.1 the row cache may help as it will store N rows per recently > accessed partition, starting at the beginning of the partition. > > ml > > > On Fri, Jul 18, 2014 at 6:30 AM, Benedict Elliott Smith < > belliottsm...@datastax.com> wrote: > >> If the versions can be guaranteed to be a adjacent (i.e. if the latest >> version is V, the prior version is V-1) you could issue a delete at the >> same time as an insert for V-N-(buffer) where buffer >= 0 >> >> In general guaranteeing that is probably hard, so this seems like >> something that would be nice to have C* manage for you. Unfortunately we >> don't have anything on the roadmap to help with this. A custom compaction >> strategy might do the trick, or permitting some filter during compaction >> that can omit/tombstone certain records based on the input data. This >> latter option probably wouldn't be too hard to implement, although it might >> not offer any guarantees about expiring records in order without incurring >> extra compaction cost (you could reasonably easily guarantee the most >> recent N are present, but the cleaning up of older records might happen >> haphazardly, in no particular order, and without any promptness guarantees, >> if you want to do it cheaply). Feel free to file a ticket, or submit a >> patch! >> >> >> On Fri, Jul 18, 2014 at 1:32 AM, Clint Kelly <clint.ke...@gmail.com> >> wrote: >> >>> Hi everyone, >>> >>> I am trying to design a schema that will keep the N-most-recent >>> versions of a value. Currently my table looks like the following: >>> >>> CREATE TABLE foo ( >>> rowkey text, >>> family text, >>> qualifier text, >>> version long, >>> value blob, >>> PRIMARY KEY (rowkey, family, qualifier, version)) >>> WITH CLUSTER ORDER BY (rowkey ASC, family ASC, qualifier ASC, version >>> DESC)); >>> >>> Is there any standard design pattern for updating such a layout such >>> that I keep the N-most-recent (version, value) pairs for every unique >>> (rowkey, family, qualifier)? I can't think of any way to do this >>> without doing a read-modify-write. The best thing I can think of is >>> to use TTL to approximate the desired behavior (which will work if I >>> know how often we are writing new data to the table). I could also >>> use "LIMIT N" in my queries to limit myself to only N items, but that >>> does not address any of the storage-size issues. >>> >>> In case anyone is curious, this question is related to some work that >>> I am doing translating a system built on HBase (which provides this >>> "keep the N-most-recent-version-of-a-cell" behavior) to Cassandra >>> while providing the user with as-similar-as-possible an interface. >>> >>> Best regards, >>> Clint >>> >> >> > -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br <http://www.chaordic.com.br/>* +55 48 3232.3200