Re: strange get_range_slices behaviour v0.6.1

aaron Tue, 04 May 2010 14:18:21 -0700

Thanks  Jonathan. 

After looking at the Lucandra code I realized my confusions has to do with
get_range_slices 
and the RandomPartitioner. When I switched to the OPP I got the expected
behaviour.



I was noticing cases under the random partitioner where keys I expected to
be returned 
were not. Can you give a little advice on the expected behaviour of
get_range_slices 
with the RP and I'll try to write a JUnit for it. e.g. Is it essentially
the same as 
under the OPP but order is undefined? 

Thanks
Aaron


On Mon, 3 May 2010 10:27:37 -0500, Jonathan Ellis <jbel...@gmail.com>
wrote:
> Util.range returns a Range object which is end-exclusive.  (You want
> "Bounds" for end-inclusive.)
> 
> On Sun, May 2, 2010 at 7:19 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>> He there, I'm still getting odd behavior with get_range_slices. I've
>> created
>> a JUNIT test that illustrates the case.
>> Could someone take a look and either let me know where my understanding
>> is
>> wrong or is this is a real issue?
>>
>>
>> I added the following to ColumnFamilyStoreTest.java
>>
>>
>>    private ColumnFamilyStore insertKey1Key2Key3() throws IOException,
>> ExecutionException, InterruptedException
>>    {
>>        List<RowMutation> rms = new LinkedList<RowMutation>();
>>        RowMutation rm;
>>        rm = new RowMutation("Keyspace2", "key1".getBytes());
>>        rm.add(new QueryPath("Standard1", null,
"Column1".getBytes()),
>> "asdf".getBytes(), 0);
>>        rms.add(rm);
>>
>>        rm = new RowMutation("Keyspace2", "key2".getBytes());
>>        rm.add(new QueryPath("Standard1", null,
"Column1".getBytes()),
>> "asdf".getBytes(), 0);
>>        rms.add(rm);
>>
>>        rm = new RowMutation("Keyspace2", "key3".getBytes());
>>        rm.add(new QueryPath("Standard1", null,
"Column1".getBytes()),
>> "asdf".getBytes(), 0);
>>        rms.add(rm);
>>        return Util.writeColumnFamily(rms);
>>    }
>>
>>
>>   �...@test
>>    public void testThreeKeyRangeAll() throws IOException,
>> ExecutionException, InterruptedException
>>    {
>>        ColumnFamilyStore cfs = insertKey1Key2Key3();
>>
>>        IPartitioner p = StorageService.getPartitioner();
>>        RangeSliceReply result =
>> cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY,
>>                                                
>>   Util.range(p, "key1",
>> "key3"),
>>                                                
>>   10,
>>                                                
>>   null,
>>
>> Arrays.asList("Column1".getBytes()));
>>        assertEquals(3, result.rows.size());
>>    }
>>
>>   �...@test
>>    public void testThreeKeyRangeSkip1() throws IOException,
>> ExecutionException, InterruptedException
>>    {
>>        ColumnFamilyStore cfs = insertKey1Key2Key3();
>>
>>        IPartitioner p = StorageService.getPartitioner();
>>        RangeSliceReply result =
>> cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY,
>>                                                
>>   Util.range(p, "key2",
>> "key3"),
>>                                                
>>   10,
>>                                                
>>   null,
>>
>> Arrays.asList("Column1".getBytes()));
>>        assertEquals(2, result.rows.size());
>>    }
>>
>> Running this with "ant test" the partial output is....
>>
>>    [junit] Testsuite: org.apache.cassandra.db.ColumnFamilyStoreTest
>>    [junit] Tests run: 7, Failures: 2, Errors: 0, Time elapsed: 1.405
>> sec
>>    [junit]
>>    [junit] Testcase:
>> testThreeKeyRangeAll(org.apache.cassandra.db.ColumnFamilyStoreTest):
>>  FAILED
>>    [junit] expected:<3> but was:<2>
>>    [junit] junit.framework.AssertionFailedError: expected:<3> but
>> was:<2>
>>    [junit]     at
>>
org.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeAll(ColumnFamilyStoreTest.java:170)
>>    [junit]
>>    [junit]
>>    [junit] Testcase:
>> testThreeKeyRangeSkip1(org.apache.cassandra.db.ColumnFamilyStoreTest):
>>  FAILED
>>    [junit] expected:<2> but was:<1>
>>    [junit] junit.framework.AssertionFailedError: expected:<2> but
>> was:<1>
>>    [junit]     at
>>
org.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeSkip1(ColumnFamilyStoreTest.java:184)
>>    [junit]
>>    [junit]
>>    [junit] Test org.apache.cassandra.db.ColumnFamilyStoreTest FAILED
>>
>>
>> Any help appreciated.
>>
>> Aaron
>>
>>
>> On 27 Apr 2010, at 09:38, aaron wrote:
>>
>>>
>>> I've broken this case down further to some pyton code that works
against
>>> the thrift generated
>>> client and am still getting the same odd results. With keys obejct1,
>>> object2 and object3 an
>>> open ended get_range_slice starting with "object1" only returns object1
>>> and
>>> 2.
>>>
>>> I'm guessing that I've got something wrong or my expectation of how
>>> get_range_slice works
>>> is wrong, but I cannot see where I've gone wrong. Any help would be
>>> appreciated.
>>>
>>> They python code to add and read keys is below, assumes a
>>> Cassandra.Client
>>> connection.
>>>
>>> import time
>>> from cassandra import Cassandra,ttypes
>>> from thrift import Thrift
>>> from thrift.protocol import TBinaryProtocol
>>> from thrift.transport import TSocket, TTransport
>>>
>>>
>>> def add_data(conn):
>>>
>>>   col_path = ttypes.ColumnPath(column_family="Standard1",
>>> column="col_name")
>>>   consistency = ttypes.ConsistencyLevel.QUORUM
>>>
>>>   for key in ["object1", "object2", "object3"]:
>>>       conn.insert("Keyspace1", key, col_path, "col_value",
>>>           int(time.time() * 1e6), consistency)
>>>   return
>>>
>>> def read_range(conn, start_key, end_key):
>>>
>>>   col_parent = ttypes.ColumnParent(column_family="Standard1")
>>>
>>>   predicate = ttypes.SlicePredicate(column_names=["col_name"])
>>>   range = ttypes.KeyRange(start_key=start_key, end_key=end_key,
>>> count=1000)
>>>   consistency = ttypes.ConsistencyLevel.QUORUM
>>>
>>>   return conn.get_range_slices("Keyspace1", col_parent,
>>>               predicate, range, consistency)
>>>
>>>
>>> Below is the result of calling read_range with different start values.
>>> I've
>>> also included
>>> the debug log for each call, the line starting with "reading
>>> RangeSliceCommand" seems to
>>> show that key hash for "object2" is greater than "object3".
>>>
>>> #expect to return objects 1,2 and 3
>>>
>>> In [37]: cass_test.read_range(conn, "object1", "")
>>> Out[37]:
>>>
>>>
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837,
>>> name='col_name', value='col_value'), super_column=None)],
>>> key='object1'),
>>>
>>>
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,
>>> name='col_name', value='col_value'), super_column=None)],
>>> key='object3')]
>>>
>>> DEBUG 09:29:59,791 range_slice
>>> DEBUG 09:29:59,791 RangeSliceCommand{keyspace='Keyspace1',
>>> column_family='Standard1', super_column=null,
>>> predicate=SlicePredicate(column_names:[...@257b40fe]),
>>> range=[121587881847328893689247922008234581399,0], max_keys=1000}
>>> DEBUG 09:29:59,791 Adding to restricted ranges
>>> [121587881847328893689247922008234581399,0] for
>>>
>>>
(75349581786326521367945210761838448174,75349581786326521367945210761838448174]
>>> DEBUG 09:29:59,791 reading RangeSliceCommand{keyspace='Keyspace1',
>>> column_family='Standard1', super_column=null,
>>> predicate=SlicePredicate(column_names:[...@257b40fe]),
>>> range=[121587881847328893689247922008234581399,0], max_keys=1000} from
>>> 1...@localhost/127.0.0.1
>>> DEBUG 09:29:59,791 Sending RangeSliceReply{rows=Row(key='object1',
>>> cf=ColumnFamily(Standard1
>>> [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3',
>>> cf=ColumnFamily(Standard1
>>> [636f6c5f6e616d65:false:9...@1272315595272693,]))}
>>> to 1...@localhost/127.0.0.1
>>> DEBUG 09:29:59,791 Processing response on a callback from
>>> 1...@localhost/127.0.0.1
>>> DEBUG 09:29:59,791 range slices read object1
>>> DEBUG 09:29:59,791 range slices read object3
>>>
>>>
>>> In [38]: cass_test.read_range(conn, "object2", "")
>>> Out[38]:
>>>
>>>
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595271798,
>>> name='col_name', value='col_value'), super_column=None)],
>>> key='object2'),
>>>
>>>
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837,
>>> name='col_name', value='col_value'), super_column=None)],
>>> key='object1'),
>>>
>>>
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,
>>> name='col_name', value='col_value'), super_column=None)],
>>> key='object3')]
>>>
>>> DEBUG 09:34:48,133 range_slice
>>> DEBUG 09:34:48,133 RangeSliceCommand{keyspace='Keyspace1',
>>> column_family='Standard1', super_column=null,
>>> predicate=SlicePredicate(column_names:[...@7966340c]),
>>> range=[28312518014678916505369931620527723964,0], max_keys=1000}
>>> DEBUG 09:34:48,133 Adding to restricted ranges
>>> [28312518014678916505369931620527723964,0] for
>>>
>>>
(75349581786326521367945210761838448174,75349581786326521367945210761838448174]
>>> DEBUG 09:34:48,133 reading RangeSliceCommand{keyspace='Keyspace1',
>>> column_family='Standard1', super_column=null,
>>> predicate=SlicePredicate(column_names:[...@7966340c]),
>>> range=[28312518014678916505369931620527723964,0], max_keys=1000} from
>>> 1...@localhost/127.0.0.1
>>> DEBUG 09:34:48,133 Sending RangeSliceReply{rows=Row(key='object2',
>>> cf=ColumnFamily(Standard1
>>> [636f6c5f6e616d65:false:9...@1272315595271798,])),Row(key='object1',
>>> cf=ColumnFamily(Standard1
>>> [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3',
>>> cf=ColumnFamily(Standard1
>>> [636f6c5f6e616d65:false:9...@1272315595272693,]))}
>>> to 1...@localhost/127.0.0.1
>>> DEBUG 09:34:48,133 Processing response on a callback from
>>> 1...@localhost/127.0.0.1
>>> DEBUG 09:34:48,133 range slices read object2
>>> DEBUG 09:34:48,133 range slices read object1
>>> DEBUG 09:34:48,133 range slices read object3
>>>
>>>
>>> In [39]: cass_test.read_range(conn, "object3", "")
>>> Out[39]:
>>>
>>>
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,
>>> name='col_name', value='col_value'), super_column=None)],
>>> key='object3')]
>>>
>>> DEBUG 09:35:26,090 range_slice
>>> DEBUG 09:35:26,090 RangeSliceCommand{keyspace='Keyspace1',
>>> column_family='Standard1', super_column=null,
>>> predicate=SlicePredicate(column_names:[...@24e33e18]),
>>> range=[123092639156685888118746480803115294277,0], max_keys=1000}
>>> DEBUG 09:35:26,090 Adding to restricted ranges
>>> [123092639156685888118746480803115294277,0] for
>>>
>>>
(75349581786326521367945210761838448174,75349581786326521367945210761838448174]
>>> DEBUG 09:35:26,090 reading RangeSliceCommand{keyspace='Keyspace1',
>>> column_family='Standard1', super_column=null,
>>> predicate=SlicePredicate(column_names:[...@24e33e18]),
>>> range=[123092639156685888118746480803115294277,0], max_keys=1000} from
>>> 1...@localhost/127.0.0.1
>>> DEBUG 09:35:26,090 Sending RangeSliceReply{rows=Row(key='object3',
>>> cf=ColumnFamily(Standard1
>>> [636f6c5f6e616d65:false:9...@1272315595272693,]))}
>>> to 1...@localhost/127.0.0.1
>>> DEBUG 09:35:26,090 Processing response on a callback from
>>> 1...@localhost/127.0.0.1
>>> DEBUG 09:35:26,090 range slices read object3
>>>
>>>
>>>
>>> thanks
>>> Aaron
>>>
>>>
>>>
>>>
>>> On Sun, 25 Apr 2010 20:23:05 -0700, aaron <aa...@the-mortons.org>
wrote:
>>>>
>>>> I've been looking at the get_range_slices feature and have found some
>>>> odd
>>>> behaviour I do not understand. Basically the keys returned in a range
>>>
>>> query
>>>>
>>>> do not match what I would expect to see. I think it may have something
>>>> to
>>>> do with the ordering of keys that I don't know about, but I'm just
>>>> guessing.
>>>>
>>>> On Cassandra v 0.6.1, single node local install; RandomPartitioner.
>>>> Using
>>>> Python and my own thin wrapper around the Thrift Python API.
>>>>
>>>> Step 1.
>>>>
>>>> Insert 3 keys into the "Standard 1" column family, called "object 1"
>>>> "object 2" and "object 3", each with a single column called 'name'
with
>>>> a
>>>> value like 'object1'
>>>>
>>>> Step 2.
>>>>
>>>> Do a get_range_slices call in the "Standard 1" CF, for column names
>>>> ["name"] with start_key "object1" and end_key "object3". I expect to
>>>> see
>>>> three results, but I only see results for object1 and object2. Below
>>>> are
>>>> the thrift types I'm passing into the Cassandra.Client object...
>>>>
>>>> - ColumnParent(column_family='Standard1', super_column=None)
>>>> - SlicePredicate(column_names=['name'], slice_range=None)
>>>> - KeyRange(end_key='object3', start_key='object1', count=4000,
>>>> end_token=None, start_token=None)
>>>>
>>>> and the output
>>>>
>>>>
>>>
>>>
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439,
>>>>
>>>> name='name', value='object1'), super_column=None)], key='object1'),
>>>>
>>>
>>>
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362,
>>>>
>>>> name='name', value='object3'), super_column=None)], key='object3')]
>>>>
>>>> Step 3.
>>>>
>>>> Modify the get_range_slices call, so the start_key is object2. In this
>>>
>>> case
>>>>
>>>> I expect to see 2 rows returned, but I get 3. Thrift args and return
>>>> are
>>>> below...
>>>>
>>>> - ColumnParent(column_family='Standard1', super_column=None)
>>>> - SlicePredicate(column_names=['name'], slice_range=None)
>>>> - KeyRange(end_key='object3', start_key='object2', count=4000,
>>>> end_token=None, start_token=None)
>>>>
>>>> and the output
>>>>
>>>>
>>>
>>>
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715,
>>>>
>>>> name='name', value='object2'), super_column=None)], key='object2'),
>>>>
>>>
>>>
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439,
>>>>
>>>> name='name', value='object1'), super_column=None)], key='object1'),
>>>>
>>>
>>>
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362,
>>>>
>>>> name='name', value='object3'), super_column=None)], key='object3')]
>>>>
>>>>
>>>>
>>>> Can anyone explain these odd results? As I said I've got my own python
>>>> wrapper around the client, so I may be doing something wrong. But I've
>>>> pulled out the thrift objects and they go in and out of the thrift
>>>> Cassandra.Client, so I think I'm ok. (I have not noticed a systematic
>>>> problem with my wrapper).
>>>>
>>>> On a more general note, is there information on the sort order of keys
>>>
>>> when
>>>>
>>>> using key ranges? I'm guessing the hash of the keys is compared and I
>>>> wondering if the hash's of the keys maintain the order of the original
>>>> values? Also I assume the order is byte order, rather than ascii or
>>>> utf8.
>>>
>>>>
>>>> I was experimenting with the difference between column slicing and key
>>>> slicing. In my I could write the keys in as column names (they are in
>>>> buckets) as well and slice there first, then use the results to to
make
>>>> a
>>>> multi key get. I'm trying to support features like, get me all the
data
>>>> where the key starts with "foo.bar".
>>>>
>>>> Thanks for the fun project.
>>>>
>>>> Aaron
>>
>>

Re: strange get_range_slices behaviour v0.6.1

Reply via email to