Hi, data modeling question,
I have been investigating cassandra to store small objects as a trivial replacement for s3. GET/PUT/DELETE are all easy, but LIST is what is tripping me up. S3 does a hierarchical list that kinda simulates traversing folders. http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html So say my schema is this: CREATE TABLE "stuff" (key BLOB PRIMARY KEY, value BLOB) I know that the prefix part is easy with a ByteOrderedPartitioner (and possibly with a secondary index in Cassandra 3.x? ). What trips me up is the delimiter part. I have looked at a handful of open source projects that are s3 clones and use cassandra, and they seem to do the prefix match then manually search for the delimiter. I have looked at doing a UDA, but they also seem to send all of the data to a single node to do the aggregation. What I am hoping to do is achieve what S3 does: "List performance is not substantially affected by the total number of keys in your bucket, nor by the presence or absence of the prefix, marker, maxkeys, or delimiter arguments." ( http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysUsingAPIs.html)<http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysUsingAPIs.html> Is there some sort of denormalization, indexing, querying that I am missing that might help solve this? I think if UDA's could do some summary operation on each node before returning it then aggregating the results it would work, but as far as I know that isn't possible. It seems like a binary search of each partition involved in the list prefix would be a really quick and easy way to return the first 1000 results. Is this even possible using cassandra? Thanks, Jake Willoughby