The time series doesn’t provide the access pattern I’m looking for. No way to query recently-modified documents.
On Jul 21, 2015, at 9:13 AM, Carlos Alonso <i...@mrcalonso.com<mailto:i...@mrcalonso.com>> wrote: Hi Robert, What about modelling it as a time serie? CREATE TABLE document ( docId UUID, doc TEXT, last_modified TIMESTAMP PRIMARY KEY(docId, last_modified) ) WITH CLUSTERING ORDER BY (last_modified DESC); This way, you the lastest modification will always be the first record in the row, therefore accessing it should be as easy as: SELECT * FROM document WHERE docId == <the docId> LIMIT 1; And, if you experience diskspace issues due to very long rows, then you can always expire old ones using TTL or on a batch job. Tombstones will never be a problem in this case as, due to the specified clustering order, the latest modification will always be first record in the row. Hope it helps. Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso> On 21 July 2015 at 05:59, Robert Wille <rwi...@fold3.com<mailto:rwi...@fold3.com>> wrote: Data structures that have a recently-modified access pattern seem to be a poor fit for Cassandra. I’m wondering if any of you smart guys can provide suggestions. For the sake of discussion, lets assume I have the following tables: CREATE TABLE document ( docId UUID, doc TEXT, last_modified TIMEUUID, PRIMARY KEY ((docid)) ) CREATE TABLE doc_by_last_modified ( date TEXT, last_modified TIMEUUID, docId UUID, PRIMARY KEY ((date), last_modified) ) When I update a document, I retrieve its last_modified time, delete the current record from doc_by_last_modified, and add a new one. Unfortunately, if you’d like each document to appear at most once in the doc_by_last_modified table, then this doesn’t work so well. Documents can get into the doc_by_last_modified table multiple times if there is concurrent access, or if there is a consistency issue. Any thoughts out there on how to efficiently provide recently-modified access to a table? This problem exists for many types of data structures, not just recently-modified. Any ordered data structure that can be dynamically reordered suffers from the same problems. As I’ve been doing schema design, this pattern keeps recurring. A nice way to address this problem has lots of applications. Thanks in advance for your thoughts Robert