Hi Aaron, Range slice means get_range_slices() in thrift api, createSuperSliceQuery in hector, get_range() in pycassa. The example code in pycassa is attached below.
The problem is a little bit complicated to explain. I'll try to describe in examples. Here are 8 super column names which exist in the specific key. The list is forward order. #0: "20031210020333/190209-20031210-4476807-s/" #1: "20031210020333/190209-20031210-4476807-s/0" #2: "20031210021940/190209-20031210-4476883-s/" #3: "20031210021940/190209-20031210-4476883-s/0" #4: "20031210022059/190209-20031210-4476885-s/" #5: "20031210022059/190209-20031210-4476885-s/0" <-- Problem around here. #6: "20031210022154/190209-20031210-4476888-s/" #7: "20031210022154/190209-20031210-4476888-s/0" There is no problem if I use the super column names exist on the key. * Range from #0 to #3 in forward order -> OK * Range from #0 to #5 in forward order -> OK * Range from #0 to #7 in forward order -> OK * Range from #7 to #0 in reverse order -> OK * Range from #5 to #0 in reverse order -> OK * Range from #3 to #0 in reverse order -> OK Because I want to scan orders in a certain range, however, I use column names which added character "z" (higher than anything in order_id). Those column names are listed below as #1z, #3z, #5z and #7z. Note that these super column names don't really exist on the key. (#4+ is a column name to locate between #4 and #5) #0 : "20031210020333/190209-20031210-4476807-s/" #1 : "20031210020333/190209-20031210-4476807-s/0" #1z: "20031210020333/190209-20031210-4476807-s/z" (don't exist) #2 : "20031210021940/190209-20031210-4476883-s/" #3 : "20031210021940/190209-20031210-4476883-s/0" #3z: "20031210021940/190209-20031210-4476883-s/z" (don't exist) #4 : "20031210022059/190209-20031210-4476885-s/" #4+: "20031210022059/190209-20031210-4476885-s/+" (don't exist) #5 : "20031210022059/190209-20031210-4476885-s/0" <-- Problem around here. #5z: "20031210022059/190209-20031210-4476885-s/z" (don't exist) #6 : "20031210022154/190209-20031210-4476888-s/" #7 : "20031210022154/190209-20031210-4476888-s/0" #7z: "20031210022154/190209-20031210-4476888-s/z" (don't exist) Then, try to range slice them. * Range from #0 to #3z in forward order -> OK * Range from #0 to #4+ in forward order -> OK * Range from #0 to #5z in forward order -> OK * Range from #0 to #7z in forward order -> OK * Range from #7z to #0 in reverse order -> OK * Range from #5z to #0 in reverse order -> FAIL (no result) * Range from #4+ to #0 in reverse order -> OK * Range from #3z to #0 in reverse order -> OK The problem happens in this case. No error or warning is shown in cassandra log. Also, I tried dumping data into json via sstable2json and restored it with json2sstable. But the same problem occurs. The code I used for the test is something like this. ---------------------- client = pycassa.connect(KEYSPACE, [ CASSANDRA_HOST ]) cf = pycassa.ColumnFamily(client, COLUMN_FAMILY) columns = [ "20031210020333/190209-20031210-4476807-s/" , #0 "20031210020333/190209-20031210-4476807-s/0" , #1 "20031210021940/190209-20031210-4476883-s/" , #2 "20031210021940/190209-20031210-4476883-s/0" , #3 "20031210022059/190209-20031210-4476885-s/" , #4 "20031210022059/190209-20031210-4476885-s/0" , #5 # <--Problem_around_here. "20031210022154/190209-20031210-4476888-s/" , #6 "20031210022154/190209-20031210-4476888-s/0" #7 ] reversed = False if len(sys.argv) > 1: # use reversed order if "-r" option is given. "-f" or others for forward order, no option will list all column names. reversed = (sys.argv[1] == '-r') start_date = columns[0] end_date = columns[7] + "z" # add "z" to make problem. if reversed: temp = start_date start_date = end_date end_date = temp pass else: start_date = end_date = '' pass print "start_date =", start_date, "end_date =", end_date, "reversed = ", reversed for it in cf.get_range(start = A_KEY, finish = A_KEY, column_reversed=reversed, column_count=10000, column_start=start_date, column_finish=end_date): for d in it[1].iteritems(): print "col='%s', len = %d" % (d[0], len(d[0])) pass pass ------------------------- Regards, Shotaro On Fri, Feb 18, 2011 at 5:19 AM, Aaron Morton <aa...@thelastpickle.com> wrote: > First some terminology, when you say range slice do you mean getting multiple > rows? Or do you mean get_slice where you return multiple super columns from > one row? > > Your examples looks like you want to get multiple super columns from one row. > In which case the choice of partitioner is not important. The comparator and > sub comparator as specified in the CF definition control the ordering of > colums. If possible i would suggest using the random partitioner. > > Could you provide examples of how you are doing the queries using pycassa we > may be able to help. > > My initial guess is that the ranges you specify for the query are not correct > when using ASCII ordering for column names, e,g, > > 20031210 < 20031210022059/190209-20031210-4476885-s/z is true > > 20031210022059/190209-20031210-4476885-s/z < 20031210 is not true > > Trying appending the highest value ASCII character to the end of 20031210 > > Cheers > Aaron > > On 18/02/2011, at 4:35 AM, Shotaro Kamio <kamios...@gmail.com> wrote: > >> Hi, >> >> We are in trouble with a strange behavior in cassandra 0.7.2 (also >> happened in 0.7.0). Could someone help us? >> >> The problem happens on a column family of super column type named "Order". >> Data structure is something like: >> Order[ a_key ][ date + "/" + order_id + "/" (+ suffix) ][attribute] = value >> >> For example, >> Order[ "100" ][ "20031210022059/190209-20031210-4476885-s/" ] >> is a super column. >> Because we want to scan them in the latest-first order, range slice >> query with reversed order is used. (Partitioner is >> ByteOrderedPartitioner). >> >> In some supercolumns in my cassandra instance, reversed query returns >> no result while it should have results. >> For instance, >> >> * Range slice in normal (lexical)-order ( Order[ "100" ] [ from >> "20031210" to "20031210022059/190209-20031210-4476885-s/z" ] ) will >> return results correctly. >> >> col='20031210014347/190209-20031210-4476668-s/' >> col='20031210014347/190209-20031210-4476668-s/0' >> col='20031210022059/190209-20031210-4476885-s/' >> col='20031210022059/190209-20031210-4476885-s/0' >> >> * Range slice in reversed (latest-first)-order ( Order[ "100" ] [ from >> "20031210022059/190209-20031210-4476885-s/z" to "20031210" ] ) will >> return NO result! >> >> Note that the super column name >> "20031210022059/190209-20031210-4476885-s/z" doesn't exist. The query >> should work. And, it succeeds in other super columns. >> >> * Range slice in reversed (latest-first)-order starting from existing >> column name ( Order[ "100" ] [ from >> "20031210022059/190209-20031210-4476885-s/0" to "20031210" ] ) will >> return results which should return. >> >> Both pycassa and hector show the same behavior on the same column >> name. I guess that cassandra has some logical error. >> >> >> I'll appreciate any help. >> >> >> Best reagards, >> Shotaro > -- Shotaro Kamio