I've broken this case down further to some pyton code that works against the thrift generated client and am still getting the same odd results. With keys obejct1, object2 and object3 an open ended get_range_slice starting with "object1" only returns object1 and 2.
I'm guessing that I've got something wrong or my expectation of how get_range_slice works is wrong, but I cannot see where I've gone wrong. Any help would be appreciated. They python code to add and read keys is below, assumes a Cassandra.Client connection. import time from cassandra import Cassandra,ttypes from thrift import Thrift from thrift.protocol import TBinaryProtocol from thrift.transport import TSocket, TTransport def add_data(conn): col_path = ttypes.ColumnPath(column_family="Standard1", column="col_name") consistency = ttypes.ConsistencyLevel.QUORUM for key in ["object1", "object2", "object3"]: conn.insert("Keyspace1", key, col_path, "col_value", int(time.time() * 1e6), consistency) return def read_range(conn, start_key, end_key): col_parent = ttypes.ColumnParent(column_family="Standard1") predicate = ttypes.SlicePredicate(column_names=["col_name"]) range = ttypes.KeyRange(start_key=start_key, end_key=end_key, count=1000) consistency = ttypes.ConsistencyLevel.QUORUM return conn.get_range_slices("Keyspace1", col_parent, predicate, range, consistency) Below is the result of calling read_range with different start values. I've also included the debug log for each call, the line starting with "reading RangeSliceCommand" seems to show that key hash for "object2" is greater than "object3". #expect to return objects 1,2 and 3 In [37]: cass_test.read_range(conn, "object1", "") Out[37]: [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837, name='col_name', value='col_value'), super_column=None)], key='object1'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693, name='col_name', value='col_value'), super_column=None)], key='object3')] DEBUG 09:29:59,791 range_slice DEBUG 09:29:59,791 RangeSliceCommand{keyspace='Keyspace1', column_family='Standard1', super_column=null, predicate=SlicePredicate(column_names:[...@257b40fe]), range=[121587881847328893689247922008234581399,0], max_keys=1000} DEBUG 09:29:59,791 Adding to restricted ranges [121587881847328893689247922008234581399,0] for (75349581786326521367945210761838448174,75349581786326521367945210761838448174] DEBUG 09:29:59,791 reading RangeSliceCommand{keyspace='Keyspace1', column_family='Standard1', super_column=null, predicate=SlicePredicate(column_names:[...@257b40fe]), range=[121587881847328893689247922008234581399,0], max_keys=1000} from 1...@localhost/127.0.0.1 DEBUG 09:29:59,791 Sending RangeSliceReply{rows=Row(key='object1', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))} to 1...@localhost/127.0.0.1 DEBUG 09:29:59,791 Processing response on a callback from 1...@localhost/127.0.0.1 DEBUG 09:29:59,791 range slices read object1 DEBUG 09:29:59,791 range slices read object3 In [38]: cass_test.read_range(conn, "object2", "") Out[38]: [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595271798, name='col_name', value='col_value'), super_column=None)], key='object2'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837, name='col_name', value='col_value'), super_column=None)], key='object1'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693, name='col_name', value='col_value'), super_column=None)], key='object3')] DEBUG 09:34:48,133 range_slice DEBUG 09:34:48,133 RangeSliceCommand{keyspace='Keyspace1', column_family='Standard1', super_column=null, predicate=SlicePredicate(column_names:[...@7966340c]), range=[28312518014678916505369931620527723964,0], max_keys=1000} DEBUG 09:34:48,133 Adding to restricted ranges [28312518014678916505369931620527723964,0] for (75349581786326521367945210761838448174,75349581786326521367945210761838448174] DEBUG 09:34:48,133 reading RangeSliceCommand{keyspace='Keyspace1', column_family='Standard1', super_column=null, predicate=SlicePredicate(column_names:[...@7966340c]), range=[28312518014678916505369931620527723964,0], max_keys=1000} from 1...@localhost/127.0.0.1 DEBUG 09:34:48,133 Sending RangeSliceReply{rows=Row(key='object2', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595271798,])),Row(key='object1', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))} to 1...@localhost/127.0.0.1 DEBUG 09:34:48,133 Processing response on a callback from 1...@localhost/127.0.0.1 DEBUG 09:34:48,133 range slices read object2 DEBUG 09:34:48,133 range slices read object1 DEBUG 09:34:48,133 range slices read object3 In [39]: cass_test.read_range(conn, "object3", "") Out[39]: [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693, name='col_name', value='col_value'), super_column=None)], key='object3')] DEBUG 09:35:26,090 range_slice DEBUG 09:35:26,090 RangeSliceCommand{keyspace='Keyspace1', column_family='Standard1', super_column=null, predicate=SlicePredicate(column_names:[...@24e33e18]), range=[123092639156685888118746480803115294277,0], max_keys=1000} DEBUG 09:35:26,090 Adding to restricted ranges [123092639156685888118746480803115294277,0] for (75349581786326521367945210761838448174,75349581786326521367945210761838448174] DEBUG 09:35:26,090 reading RangeSliceCommand{keyspace='Keyspace1', column_family='Standard1', super_column=null, predicate=SlicePredicate(column_names:[...@24e33e18]), range=[123092639156685888118746480803115294277,0], max_keys=1000} from 1...@localhost/127.0.0.1 DEBUG 09:35:26,090 Sending RangeSliceReply{rows=Row(key='object3', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))} to 1...@localhost/127.0.0.1 DEBUG 09:35:26,090 Processing response on a callback from 1...@localhost/127.0.0.1 DEBUG 09:35:26,090 range slices read object3 thanks Aaron On Sun, 25 Apr 2010 20:23:05 -0700, aaron <aa...@the-mortons.org> wrote: > I've been looking at the get_range_slices feature and have found some odd > behaviour I do not understand. Basically the keys returned in a range query > do not match what I would expect to see. I think it may have something to > do with the ordering of keys that I don't know about, but I'm just > guessing. > > On Cassandra v 0.6.1, single node local install; RandomPartitioner. Using > Python and my own thin wrapper around the Thrift Python API. > > Step 1. > > Insert 3 keys into the "Standard 1" column family, called "object 1" > "object 2" and "object 3", each with a single column called 'name' with a > value like 'object1' > > Step 2. > > Do a get_range_slices call in the "Standard 1" CF, for column names > ["name"] with start_key "object1" and end_key "object3". I expect to see > three results, but I only see results for object1 and object2. Below are > the thrift types I'm passing into the Cassandra.Client object... > > - ColumnParent(column_family='Standard1', super_column=None) > - SlicePredicate(column_names=['name'], slice_range=None) > - KeyRange(end_key='object3', start_key='object1', count=4000, > end_token=None, start_token=None) > > and the output > > [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, > name='name', value='object1'), super_column=None)], key='object1'), > KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, > name='name', value='object3'), super_column=None)], key='object3')] > > Step 3. > > Modify the get_range_slices call, so the start_key is object2. In this case > I expect to see 2 rows returned, but I get 3. Thrift args and return are > below... > > - ColumnParent(column_family='Standard1', super_column=None) > - SlicePredicate(column_names=['name'], slice_range=None) > - KeyRange(end_key='object3', start_key='object2', count=4000, > end_token=None, start_token=None) > > and the output > > [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715, > name='name', value='object2'), super_column=None)], key='object2'), > KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, > name='name', value='object1'), super_column=None)], key='object1'), > KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, > name='name', value='object3'), super_column=None)], key='object3')] > > > > Can anyone explain these odd results? As I said I've got my own python > wrapper around the client, so I may be doing something wrong. But I've > pulled out the thrift objects and they go in and out of the thrift > Cassandra.Client, so I think I'm ok. (I have not noticed a systematic > problem with my wrapper). > > On a more general note, is there information on the sort order of keys when > using key ranges? I'm guessing the hash of the keys is compared and I > wondering if the hash's of the keys maintain the order of the original > values? Also I assume the order is byte order, rather than ascii or utf8. > > I was experimenting with the difference between column slicing and key > slicing. In my I could write the keys in as column names (they are in > buckets) as well and slice there first, then use the results to to make a > multi key get. I'm trying to support features like, get me all the data > where the key starts with "foo.bar". > > Thanks for the fun project. > > Aaron