Thanks Jonathan. After looking at the Lucandra code I realized my confusions has to do with get_range_slices and the RandomPartitioner. When I switched to the OPP I got the expected behaviour.
I was noticing cases under the random partitioner where keys I expected to be returned were not. Can you give a little advice on the expected behaviour of get_range_slices with the RP and I'll try to write a JUnit for it. e.g. Is it essentially the same as under the OPP but order is undefined? Thanks Aaron On Mon, 3 May 2010 10:27:37 -0500, Jonathan Ellis <jbel...@gmail.com> wrote: > Util.range returns a Range object which is end-exclusive. (You want > "Bounds" for end-inclusive.) > > On Sun, May 2, 2010 at 7:19 AM, aaron morton <aa...@thelastpickle.com> > wrote: >> He there, I'm still getting odd behavior with get_range_slices. I've >> created >> a JUNIT test that illustrates the case. >> Could someone take a look and either let me know where my understanding >> is >> wrong or is this is a real issue? >> >> >> I added the following to ColumnFamilyStoreTest.java >> >> >> private ColumnFamilyStore insertKey1Key2Key3() throws IOException, >> ExecutionException, InterruptedException >> { >> List<RowMutation> rms = new LinkedList<RowMutation>(); >> RowMutation rm; >> rm = new RowMutation("Keyspace2", "key1".getBytes()); >> rm.add(new QueryPath("Standard1", null, "Column1".getBytes()), >> "asdf".getBytes(), 0); >> rms.add(rm); >> >> rm = new RowMutation("Keyspace2", "key2".getBytes()); >> rm.add(new QueryPath("Standard1", null, "Column1".getBytes()), >> "asdf".getBytes(), 0); >> rms.add(rm); >> >> rm = new RowMutation("Keyspace2", "key3".getBytes()); >> rm.add(new QueryPath("Standard1", null, "Column1".getBytes()), >> "asdf".getBytes(), 0); >> rms.add(rm); >> return Util.writeColumnFamily(rms); >> } >> >> >> �...@test >> public void testThreeKeyRangeAll() throws IOException, >> ExecutionException, InterruptedException >> { >> ColumnFamilyStore cfs = insertKey1Key2Key3(); >> >> IPartitioner p = StorageService.getPartitioner(); >> RangeSliceReply result = >> cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY, >> >> Util.range(p, "key1", >> "key3"), >> >> 10, >> >> null, >> >> Arrays.asList("Column1".getBytes())); >> assertEquals(3, result.rows.size()); >> } >> >> �...@test >> public void testThreeKeyRangeSkip1() throws IOException, >> ExecutionException, InterruptedException >> { >> ColumnFamilyStore cfs = insertKey1Key2Key3(); >> >> IPartitioner p = StorageService.getPartitioner(); >> RangeSliceReply result = >> cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY, >> >> Util.range(p, "key2", >> "key3"), >> >> 10, >> >> null, >> >> Arrays.asList("Column1".getBytes())); >> assertEquals(2, result.rows.size()); >> } >> >> Running this with "ant test" the partial output is.... >> >> [junit] Testsuite: org.apache.cassandra.db.ColumnFamilyStoreTest >> [junit] Tests run: 7, Failures: 2, Errors: 0, Time elapsed: 1.405 >> sec >> [junit] >> [junit] Testcase: >> testThreeKeyRangeAll(org.apache.cassandra.db.ColumnFamilyStoreTest): >> FAILED >> [junit] expected:<3> but was:<2> >> [junit] junit.framework.AssertionFailedError: expected:<3> but >> was:<2> >> [junit] at >> org.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeAll(ColumnFamilyStoreTest.java:170) >> [junit] >> [junit] >> [junit] Testcase: >> testThreeKeyRangeSkip1(org.apache.cassandra.db.ColumnFamilyStoreTest): >> FAILED >> [junit] expected:<2> but was:<1> >> [junit] junit.framework.AssertionFailedError: expected:<2> but >> was:<1> >> [junit] at >> org.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeSkip1(ColumnFamilyStoreTest.java:184) >> [junit] >> [junit] >> [junit] Test org.apache.cassandra.db.ColumnFamilyStoreTest FAILED >> >> >> Any help appreciated. >> >> Aaron >> >> >> On 27 Apr 2010, at 09:38, aaron wrote: >> >>> >>> I've broken this case down further to some pyton code that works against >>> the thrift generated >>> client and am still getting the same odd results. With keys obejct1, >>> object2 and object3 an >>> open ended get_range_slice starting with "object1" only returns object1 >>> and >>> 2. >>> >>> I'm guessing that I've got something wrong or my expectation of how >>> get_range_slice works >>> is wrong, but I cannot see where I've gone wrong. Any help would be >>> appreciated. >>> >>> They python code to add and read keys is below, assumes a >>> Cassandra.Client >>> connection. >>> >>> import time >>> from cassandra import Cassandra,ttypes >>> from thrift import Thrift >>> from thrift.protocol import TBinaryProtocol >>> from thrift.transport import TSocket, TTransport >>> >>> >>> def add_data(conn): >>> >>> col_path = ttypes.ColumnPath(column_family="Standard1", >>> column="col_name") >>> consistency = ttypes.ConsistencyLevel.QUORUM >>> >>> for key in ["object1", "object2", "object3"]: >>> conn.insert("Keyspace1", key, col_path, "col_value", >>> int(time.time() * 1e6), consistency) >>> return >>> >>> def read_range(conn, start_key, end_key): >>> >>> col_parent = ttypes.ColumnParent(column_family="Standard1") >>> >>> predicate = ttypes.SlicePredicate(column_names=["col_name"]) >>> range = ttypes.KeyRange(start_key=start_key, end_key=end_key, >>> count=1000) >>> consistency = ttypes.ConsistencyLevel.QUORUM >>> >>> return conn.get_range_slices("Keyspace1", col_parent, >>> predicate, range, consistency) >>> >>> >>> Below is the result of calling read_range with different start values. >>> I've >>> also included >>> the debug log for each call, the line starting with "reading >>> RangeSliceCommand" seems to >>> show that key hash for "object2" is greater than "object3". >>> >>> #expect to return objects 1,2 and 3 >>> >>> In [37]: cass_test.read_range(conn, "object1", "") >>> Out[37]: >>> >>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837, >>> name='col_name', value='col_value'), super_column=None)], >>> key='object1'), >>> >>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693, >>> name='col_name', value='col_value'), super_column=None)], >>> key='object3')] >>> >>> DEBUG 09:29:59,791 range_slice >>> DEBUG 09:29:59,791 RangeSliceCommand{keyspace='Keyspace1', >>> column_family='Standard1', super_column=null, >>> predicate=SlicePredicate(column_names:[...@257b40fe]), >>> range=[121587881847328893689247922008234581399,0], max_keys=1000} >>> DEBUG 09:29:59,791 Adding to restricted ranges >>> [121587881847328893689247922008234581399,0] for >>> >>> (75349581786326521367945210761838448174,75349581786326521367945210761838448174] >>> DEBUG 09:29:59,791 reading RangeSliceCommand{keyspace='Keyspace1', >>> column_family='Standard1', super_column=null, >>> predicate=SlicePredicate(column_names:[...@257b40fe]), >>> range=[121587881847328893689247922008234581399,0], max_keys=1000} from >>> 1...@localhost/127.0.0.1 >>> DEBUG 09:29:59,791 Sending RangeSliceReply{rows=Row(key='object1', >>> cf=ColumnFamily(Standard1 >>> [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3', >>> cf=ColumnFamily(Standard1 >>> [636f6c5f6e616d65:false:9...@1272315595272693,]))} >>> to 1...@localhost/127.0.0.1 >>> DEBUG 09:29:59,791 Processing response on a callback from >>> 1...@localhost/127.0.0.1 >>> DEBUG 09:29:59,791 range slices read object1 >>> DEBUG 09:29:59,791 range slices read object3 >>> >>> >>> In [38]: cass_test.read_range(conn, "object2", "") >>> Out[38]: >>> >>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595271798, >>> name='col_name', value='col_value'), super_column=None)], >>> key='object2'), >>> >>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837, >>> name='col_name', value='col_value'), super_column=None)], >>> key='object1'), >>> >>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693, >>> name='col_name', value='col_value'), super_column=None)], >>> key='object3')] >>> >>> DEBUG 09:34:48,133 range_slice >>> DEBUG 09:34:48,133 RangeSliceCommand{keyspace='Keyspace1', >>> column_family='Standard1', super_column=null, >>> predicate=SlicePredicate(column_names:[...@7966340c]), >>> range=[28312518014678916505369931620527723964,0], max_keys=1000} >>> DEBUG 09:34:48,133 Adding to restricted ranges >>> [28312518014678916505369931620527723964,0] for >>> >>> (75349581786326521367945210761838448174,75349581786326521367945210761838448174] >>> DEBUG 09:34:48,133 reading RangeSliceCommand{keyspace='Keyspace1', >>> column_family='Standard1', super_column=null, >>> predicate=SlicePredicate(column_names:[...@7966340c]), >>> range=[28312518014678916505369931620527723964,0], max_keys=1000} from >>> 1...@localhost/127.0.0.1 >>> DEBUG 09:34:48,133 Sending RangeSliceReply{rows=Row(key='object2', >>> cf=ColumnFamily(Standard1 >>> [636f6c5f6e616d65:false:9...@1272315595271798,])),Row(key='object1', >>> cf=ColumnFamily(Standard1 >>> [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3', >>> cf=ColumnFamily(Standard1 >>> [636f6c5f6e616d65:false:9...@1272315595272693,]))} >>> to 1...@localhost/127.0.0.1 >>> DEBUG 09:34:48,133 Processing response on a callback from >>> 1...@localhost/127.0.0.1 >>> DEBUG 09:34:48,133 range slices read object2 >>> DEBUG 09:34:48,133 range slices read object1 >>> DEBUG 09:34:48,133 range slices read object3 >>> >>> >>> In [39]: cass_test.read_range(conn, "object3", "") >>> Out[39]: >>> >>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693, >>> name='col_name', value='col_value'), super_column=None)], >>> key='object3')] >>> >>> DEBUG 09:35:26,090 range_slice >>> DEBUG 09:35:26,090 RangeSliceCommand{keyspace='Keyspace1', >>> column_family='Standard1', super_column=null, >>> predicate=SlicePredicate(column_names:[...@24e33e18]), >>> range=[123092639156685888118746480803115294277,0], max_keys=1000} >>> DEBUG 09:35:26,090 Adding to restricted ranges >>> [123092639156685888118746480803115294277,0] for >>> >>> (75349581786326521367945210761838448174,75349581786326521367945210761838448174] >>> DEBUG 09:35:26,090 reading RangeSliceCommand{keyspace='Keyspace1', >>> column_family='Standard1', super_column=null, >>> predicate=SlicePredicate(column_names:[...@24e33e18]), >>> range=[123092639156685888118746480803115294277,0], max_keys=1000} from >>> 1...@localhost/127.0.0.1 >>> DEBUG 09:35:26,090 Sending RangeSliceReply{rows=Row(key='object3', >>> cf=ColumnFamily(Standard1 >>> [636f6c5f6e616d65:false:9...@1272315595272693,]))} >>> to 1...@localhost/127.0.0.1 >>> DEBUG 09:35:26,090 Processing response on a callback from >>> 1...@localhost/127.0.0.1 >>> DEBUG 09:35:26,090 range slices read object3 >>> >>> >>> >>> thanks >>> Aaron >>> >>> >>> >>> >>> On Sun, 25 Apr 2010 20:23:05 -0700, aaron <aa...@the-mortons.org> wrote: >>>> >>>> I've been looking at the get_range_slices feature and have found some >>>> odd >>>> behaviour I do not understand. Basically the keys returned in a range >>> >>> query >>>> >>>> do not match what I would expect to see. I think it may have something >>>> to >>>> do with the ordering of keys that I don't know about, but I'm just >>>> guessing. >>>> >>>> On Cassandra v 0.6.1, single node local install; RandomPartitioner. >>>> Using >>>> Python and my own thin wrapper around the Thrift Python API. >>>> >>>> Step 1. >>>> >>>> Insert 3 keys into the "Standard 1" column family, called "object 1" >>>> "object 2" and "object 3", each with a single column called 'name' with >>>> a >>>> value like 'object1' >>>> >>>> Step 2. >>>> >>>> Do a get_range_slices call in the "Standard 1" CF, for column names >>>> ["name"] with start_key "object1" and end_key "object3". I expect to >>>> see >>>> three results, but I only see results for object1 and object2. Below >>>> are >>>> the thrift types I'm passing into the Cassandra.Client object... >>>> >>>> - ColumnParent(column_family='Standard1', super_column=None) >>>> - SlicePredicate(column_names=['name'], slice_range=None) >>>> - KeyRange(end_key='object3', start_key='object1', count=4000, >>>> end_token=None, start_token=None) >>>> >>>> and the output >>>> >>>> >>> >>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, >>>> >>>> name='name', value='object1'), super_column=None)], key='object1'), >>>> >>> >>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, >>>> >>>> name='name', value='object3'), super_column=None)], key='object3')] >>>> >>>> Step 3. >>>> >>>> Modify the get_range_slices call, so the start_key is object2. In this >>> >>> case >>>> >>>> I expect to see 2 rows returned, but I get 3. Thrift args and return >>>> are >>>> below... >>>> >>>> - ColumnParent(column_family='Standard1', super_column=None) >>>> - SlicePredicate(column_names=['name'], slice_range=None) >>>> - KeyRange(end_key='object3', start_key='object2', count=4000, >>>> end_token=None, start_token=None) >>>> >>>> and the output >>>> >>>> >>> >>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715, >>>> >>>> name='name', value='object2'), super_column=None)], key='object2'), >>>> >>> >>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, >>>> >>>> name='name', value='object1'), super_column=None)], key='object1'), >>>> >>> >>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, >>>> >>>> name='name', value='object3'), super_column=None)], key='object3')] >>>> >>>> >>>> >>>> Can anyone explain these odd results? As I said I've got my own python >>>> wrapper around the client, so I may be doing something wrong. But I've >>>> pulled out the thrift objects and they go in and out of the thrift >>>> Cassandra.Client, so I think I'm ok. (I have not noticed a systematic >>>> problem with my wrapper). >>>> >>>> On a more general note, is there information on the sort order of keys >>> >>> when >>>> >>>> using key ranges? I'm guessing the hash of the keys is compared and I >>>> wondering if the hash's of the keys maintain the order of the original >>>> values? Also I assume the order is byte order, rather than ascii or >>>> utf8. >>> >>>> >>>> I was experimenting with the difference between column slicing and key >>>> slicing. In my I could write the keys in as column names (they are in >>>> buckets) as well and slice there first, then use the results to to make >>>> a >>>> multi key get. I'm trying to support features like, get me all the data >>>> where the key starts with "foo.bar". >>>> >>>> Thanks for the fun project. >>>> >>>> Aaron >> >>