I'm unable to reproduce this in pycassa starting with a clean database. Are you doing anything else to these rows besides inserting them?
Here's the complete script I'm using below. Could you confirm that this causes problems for you? - Tyler ========= import sys import pycassa pool = pycassa.ConnectionPool('Keyspace1') cf = pycassa.ColumnFamily(pool, 'Super1') KEY = 'key' columns = [ "20031210020333/190209-20031210-4476807-s/" , #0 "20031210020333/190209-20031210-4476807-s/0" , #1 "20031210021940/190209-20031210-4476883-s/" , #2 "20031210021940/190209-20031210-4476883-s/0" , #3 "20031210022059/190209-20031210-4476885-s/" , #4 "20031210022059/190209-20031210-4476885-s/0" , #5 # <--Problem_around_here. "20031210022154/190209-20031210-4476888-s/" , #6 "20031210022154/190209-20031210-4476888-s/0" #7 ] for supercolumn in columns: cf.insert(KEY, {supercolumn: {'subcol': 'subval', 'subcol2': 'subval'}}) def get_cols(start_date, end_date, reversed): for key, cols in cf.get_range(start = KEY, finish = KEY, column_reversed=reversed, column_count=10000, column_start=start_date, column_finish=end_date): for supercol, subcols in cols.iteritems(): print "col='%s' \tlen = %d" % (supercol, len(subcols)) start = 0 for end in [0,3,5,7]: print "\nstart %d, end %d + 'z'" % (start, end) get_cols(columns[start], columns[end] + 'z', False) end = 0 for start in [0, 3, 5, 7]: print "\nstart %d + 'z', end %d (reversed)" % (start, end) get_cols(columns[end], columns[start] + 'z', False) On Thu, Feb 17, 2011 at 11:09 PM, Shotaro Kamio <kamios...@gmail.com> wrote: > Hi Aaron, > > Range slice means get_range_slices() in thrift api, > createSuperSliceQuery in hector, get_range() in pycassa. The example > code in pycassa is attached below. > > The problem is a little bit complicated to explain. I'll try to > describe in examples. > Here are 8 super column names which exist in the specific key. The > list is forward order. > > #0: "20031210020333/190209-20031210-4476807-s/" > #1: "20031210020333/190209-20031210-4476807-s/0" > #2: "20031210021940/190209-20031210-4476883-s/" > #3: "20031210021940/190209-20031210-4476883-s/0" > #4: "20031210022059/190209-20031210-4476885-s/" > #5: "20031210022059/190209-20031210-4476885-s/0" <-- Problem around here. > #6: "20031210022154/190209-20031210-4476888-s/" > #7: "20031210022154/190209-20031210-4476888-s/0" > > There is no problem if I use the super column names exist on the key. > > * Range from #0 to #3 in forward order -> OK > * Range from #0 to #5 in forward order -> OK > * Range from #0 to #7 in forward order -> OK > > * Range from #7 to #0 in reverse order -> OK > * Range from #5 to #0 in reverse order -> OK > * Range from #3 to #0 in reverse order -> OK > > > Because I want to scan orders in a certain range, however, I use > column names which added character "z" (higher than anything in > order_id). Those column names are listed below as #1z, #3z, #5z and > #7z. Note that these super column names don't really exist on the key. > (#4+ is a column name to locate between #4 and #5) > > #0 : "20031210020333/190209-20031210-4476807-s/" > #1 : "20031210020333/190209-20031210-4476807-s/0" > #1z: "20031210020333/190209-20031210-4476807-s/z" (don't exist) > #2 : "20031210021940/190209-20031210-4476883-s/" > #3 : "20031210021940/190209-20031210-4476883-s/0" > #3z: "20031210021940/190209-20031210-4476883-s/z" (don't exist) > #4 : "20031210022059/190209-20031210-4476885-s/" > #4+: "20031210022059/190209-20031210-4476885-s/+" (don't exist) > #5 : "20031210022059/190209-20031210-4476885-s/0" <-- Problem around here. > #5z: "20031210022059/190209-20031210-4476885-s/z" (don't exist) > #6 : "20031210022154/190209-20031210-4476888-s/" > #7 : "20031210022154/190209-20031210-4476888-s/0" > #7z: "20031210022154/190209-20031210-4476888-s/z" (don't exist) > > Then, try to range slice them. > > * Range from #0 to #3z in forward order -> OK > * Range from #0 to #4+ in forward order -> OK > * Range from #0 to #5z in forward order -> OK > * Range from #0 to #7z in forward order -> OK > > * Range from #7z to #0 in reverse order -> OK > * Range from #5z to #0 in reverse order -> FAIL (no result) > * Range from #4+ to #0 in reverse order -> OK > * Range from #3z to #0 in reverse order -> OK > > The problem happens in this case. No error or warning is shown in cassandra > log. > > Also, I tried dumping data into json via sstable2json and restored it > with json2sstable. But the same problem occurs. > > > The code I used for the test is something like this. > ---------------------- > client = pycassa.connect(KEYSPACE, [ CASSANDRA_HOST ]) > cf = pycassa.ColumnFamily(client, COLUMN_FAMILY) > > columns = [ > "20031210020333/190209-20031210-4476807-s/" , #0 > "20031210020333/190209-20031210-4476807-s/0" , #1 > "20031210021940/190209-20031210-4476883-s/" , #2 > "20031210021940/190209-20031210-4476883-s/0" , #3 > "20031210022059/190209-20031210-4476885-s/" , #4 > "20031210022059/190209-20031210-4476885-s/0" , #5 > # <--Problem_around_here. > "20031210022154/190209-20031210-4476888-s/" , #6 > "20031210022154/190209-20031210-4476888-s/0" #7 > ] > > reversed = False > if len(sys.argv) > 1: > # use reversed order if "-r" option is given. "-f" or others for > forward order, no option will list all column names. > reversed = (sys.argv[1] == '-r') > > start_date = columns[0] > end_date = columns[7] + "z" # add "z" to make problem. > > if reversed: > temp = start_date > start_date = end_date > end_date = temp > pass > else: > start_date = end_date = '' > pass > > print "start_date =", start_date, "end_date =", end_date, "reversed = > ", reversed > > for it in cf.get_range(start = A_KEY, finish = A_KEY, > column_reversed=reversed, column_count=10000, column_start=start_date, > column_finish=end_date): > > for d in it[1].iteritems(): > print "col='%s', len = %d" % (d[0], len(d[0])) > pass > pass > > ------------------------- > > > Regards, > Shotaro > > > > > On Fri, Feb 18, 2011 at 5:19 AM, Aaron Morton <aa...@thelastpickle.com> > wrote: > > First some terminology, when you say range slice do you mean getting > multiple rows? Or do you mean get_slice where you return multiple super > columns from one row? > > > > Your examples looks like you want to get multiple super columns from one > row. In which case the choice of partitioner is not important. The > comparator and sub comparator as specified in the CF definition control the > ordering of colums. If possible i would suggest using the random > partitioner. > > > > Could you provide examples of how you are doing the queries using pycassa > we may be able to help. > > > > My initial guess is that the ranges you specify for the query are not > correct when using ASCII ordering for column names, e,g, > > > > 20031210 < 20031210022059/190209-20031210-4476885-s/z is true > > > > 20031210022059/190209-20031210-4476885-s/z < 20031210 is not true > > > > Trying appending the highest value ASCII character to the end of 20031210 > > > > Cheers > > Aaron > > > > On 18/02/2011, at 4:35 AM, Shotaro Kamio <kamios...@gmail.com> wrote: > > > >> Hi, > >> > >> We are in trouble with a strange behavior in cassandra 0.7.2 (also > >> happened in 0.7.0). Could someone help us? > >> > >> The problem happens on a column family of super column type named > "Order". > >> Data structure is something like: > >> Order[ a_key ][ date + "/" + order_id + "/" (+ suffix) ][attribute] = > value > >> > >> For example, > >> Order[ "100" ][ "20031210022059/190209-20031210-4476885-s/" ] > >> is a super column. > >> Because we want to scan them in the latest-first order, range slice > >> query with reversed order is used. (Partitioner is > >> ByteOrderedPartitioner). > >> > >> In some supercolumns in my cassandra instance, reversed query returns > >> no result while it should have results. > >> For instance, > >> > >> * Range slice in normal (lexical)-order ( Order[ "100" ] [ from > >> "20031210" to "20031210022059/190209-20031210-4476885-s/z" ] ) will > >> return results correctly. > >> > >> col='20031210014347/190209-20031210-4476668-s/' > >> col='20031210014347/190209-20031210-4476668-s/0' > >> col='20031210022059/190209-20031210-4476885-s/' > >> col='20031210022059/190209-20031210-4476885-s/0' > >> > >> * Range slice in reversed (latest-first)-order ( Order[ "100" ] [ from > >> "20031210022059/190209-20031210-4476885-s/z" to "20031210" ] ) will > >> return NO result! > >> > >> Note that the super column name > >> "20031210022059/190209-20031210-4476885-s/z" doesn't exist. The query > >> should work. And, it succeeds in other super columns. > >> > >> * Range slice in reversed (latest-first)-order starting from existing > >> column name ( Order[ "100" ] [ from > >> "20031210022059/190209-20031210-4476885-s/0" to "20031210" ] ) will > >> return results which should return. > >> > >> Both pycassa and hector show the same behavior on the same column > >> name. I guess that cassandra has some logical error. > >> > >> > >> I'll appreciate any help. > >> > >> > >> Best reagards, > >> Shotaro > > > > > > -- > Shotaro Kamio > -- Tyler Hobbs Software Engineer, DataStax <http://datastax.com/> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra Python client library