I've been looking at the get_range_slices feature and have found some odd behaviour I do not understand. Basically the keys returned in a range query do not match what I would expect to see. I think it may have something to do with the ordering of keys that I don't know about, but I'm just guessing.
On Cassandra v 0.6.1, single node local install; RandomPartitioner. Using Python and my own thin wrapper around the Thrift Python API. Step 1. Insert 3 keys into the "Standard 1" column family, called "object 1" "object 2" and "object 3", each with a single column called 'name' with a value like 'object1' Step 2. Do a get_range_slices call in the "Standard 1" CF, for column names ["name"] with start_key "object1" and end_key "object3". I expect to see three results, but I only see results for object1 and object2. Below are the thrift types I'm passing into the Cassandra.Client object... - ColumnParent(column_family='Standard1', super_column=None) - SlicePredicate(column_names=['name'], slice_range=None) - KeyRange(end_key='object3', start_key='object1', count=4000, end_token=None, start_token=None) and the output [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, name='name', value='object1'), super_column=None)], key='object1'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, name='name', value='object3'), super_column=None)], key='object3')] Step 3. Modify the get_range_slices call, so the start_key is object2. In this case I expect to see 2 rows returned, but I get 3. Thrift args and return are below... - ColumnParent(column_family='Standard1', super_column=None) - SlicePredicate(column_names=['name'], slice_range=None) - KeyRange(end_key='object3', start_key='object2', count=4000, end_token=None, start_token=None) and the output [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715, name='name', value='object2'), super_column=None)], key='object2'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, name='name', value='object1'), super_column=None)], key='object1'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, name='name', value='object3'), super_column=None)], key='object3')] Can anyone explain these odd results? As I said I've got my own python wrapper around the client, so I may be doing something wrong. But I've pulled out the thrift objects and they go in and out of the thrift Cassandra.Client, so I think I'm ok. (I have not noticed a systematic problem with my wrapper). On a more general note, is there information on the sort order of keys when using key ranges? I'm guessing the hash of the keys is compared and I wondering if the hash's of the keys maintain the order of the original values? Also I assume the order is byte order, rather than ascii or utf8. I was experimenting with the difference between column slicing and key slicing. In my I could write the keys in as column names (they are in buckets) as well and slice there first, then use the results to to make a multi key get. I'm trying to support features like, get me all the data where the key starts with "foo.bar". Thanks for the fun project. Aaron