Hi Tyler, Your script doesn't cause the problem. But the problem really occurs in a situation. My colleague analyzed the problem and find out how to reproduce the problem. Please look at the jira. https://issues.apache.org/jira/browse/CASSANDRA-2212
Best regards, Shotaro On Fri, Feb 18, 2011 at 3:59 PM, Tyler Hobbs <ty...@datastax.com> wrote: > I'm unable to reproduce this in pycassa starting with a clean database. Are > you doing anything else to these rows besides inserting them? > > Here's the complete script I'm using below. Could you confirm that this > causes problems for you? > > - Tyler > > ========= > > import sys > import pycassa > > pool = pycassa.ConnectionPool('Keyspace1') > cf = pycassa.ColumnFamily(pool, 'Super1') > > KEY = 'key' > > columns = [ > "20031210020333/190209-20031210-4476807-s/" , #0 > "20031210020333/190209-20031210-4476807-s/0" , #1 > "20031210021940/190209-20031210-4476883-s/" , #2 > "20031210021940/190209-20031210-4476883-s/0" , #3 > "20031210022059/190209-20031210-4476885-s/" , #4 > "20031210022059/190209-20031210-4476885-s/0" , #5 > # <--Problem_around_here. > "20031210022154/190209-20031210-4476888-s/" , #6 > "20031210022154/190209-20031210-4476888-s/0" #7 > ] > > for supercolumn in columns: > cf.insert(KEY, {supercolumn: {'subcol': 'subval', 'subcol2': 'subval'}}) > > def get_cols(start_date, end_date, reversed): > for key, cols in cf.get_range(start = KEY, > finish = KEY, > column_reversed=reversed, > column_count=10000, > column_start=start_date, > column_finish=end_date): > for supercol, subcols in cols.iteritems(): > print "col='%s' \tlen = %d" % (supercol, len(subcols)) > > start = 0 > for end in [0,3,5,7]: > print "\nstart %d, end %d + 'z'" % (start, end) > get_cols(columns[start], columns[end] + 'z', False) > > end = 0 > for start in [0, 3, 5, 7]: > print "\nstart %d + 'z', end %d (reversed)" % (start, end) > get_cols(columns[end], columns[start] + 'z', False) > > > On Thu, Feb 17, 2011 at 11:09 PM, Shotaro Kamio <kamios...@gmail.com> wrote: >> >> Hi Aaron, >> >> Range slice means get_range_slices() in thrift api, >> createSuperSliceQuery in hector, get_range() in pycassa. The example >> code in pycassa is attached below. >> >> The problem is a little bit complicated to explain. I'll try to >> describe in examples. >> Here are 8 super column names which exist in the specific key. The >> list is forward order. >> >> #0: "20031210020333/190209-20031210-4476807-s/" >> #1: "20031210020333/190209-20031210-4476807-s/0" >> #2: "20031210021940/190209-20031210-4476883-s/" >> #3: "20031210021940/190209-20031210-4476883-s/0" >> #4: "20031210022059/190209-20031210-4476885-s/" >> #5: "20031210022059/190209-20031210-4476885-s/0" <-- Problem around here. >> #6: "20031210022154/190209-20031210-4476888-s/" >> #7: "20031210022154/190209-20031210-4476888-s/0" >> >> There is no problem if I use the super column names exist on the key. >> >> * Range from #0 to #3 in forward order -> OK >> * Range from #0 to #5 in forward order -> OK >> * Range from #0 to #7 in forward order -> OK >> >> * Range from #7 to #0 in reverse order -> OK >> * Range from #5 to #0 in reverse order -> OK >> * Range from #3 to #0 in reverse order -> OK >> >> >> Because I want to scan orders in a certain range, however, I use >> column names which added character "z" (higher than anything in >> order_id). Those column names are listed below as #1z, #3z, #5z and >> #7z. Note that these super column names don't really exist on the key. >> (#4+ is a column name to locate between #4 and #5) >> >> #0 : "20031210020333/190209-20031210-4476807-s/" >> #1 : "20031210020333/190209-20031210-4476807-s/0" >> #1z: "20031210020333/190209-20031210-4476807-s/z" (don't exist) >> #2 : "20031210021940/190209-20031210-4476883-s/" >> #3 : "20031210021940/190209-20031210-4476883-s/0" >> #3z: "20031210021940/190209-20031210-4476883-s/z" (don't exist) >> #4 : "20031210022059/190209-20031210-4476885-s/" >> #4+: "20031210022059/190209-20031210-4476885-s/+" (don't exist) >> #5 : "20031210022059/190209-20031210-4476885-s/0" <-- Problem around >> here. >> #5z: "20031210022059/190209-20031210-4476885-s/z" (don't exist) >> #6 : "20031210022154/190209-20031210-4476888-s/" >> #7 : "20031210022154/190209-20031210-4476888-s/0" >> #7z: "20031210022154/190209-20031210-4476888-s/z" (don't exist) >> >> Then, try to range slice them. >> >> * Range from #0 to #3z in forward order -> OK >> * Range from #0 to #4+ in forward order -> OK >> * Range from #0 to #5z in forward order -> OK >> * Range from #0 to #7z in forward order -> OK >> >> * Range from #7z to #0 in reverse order -> OK >> * Range from #5z to #0 in reverse order -> FAIL (no result) >> * Range from #4+ to #0 in reverse order -> OK >> * Range from #3z to #0 in reverse order -> OK >> >> The problem happens in this case. No error or warning is shown in >> cassandra log. >> >> Also, I tried dumping data into json via sstable2json and restored it >> with json2sstable. But the same problem occurs. >> >> >> The code I used for the test is something like this. >> ---------------------- >> client = pycassa.connect(KEYSPACE, [ CASSANDRA_HOST ]) >> cf = pycassa.ColumnFamily(client, COLUMN_FAMILY) >> >> columns = [ >> "20031210020333/190209-20031210-4476807-s/" , #0 >> "20031210020333/190209-20031210-4476807-s/0" , #1 >> "20031210021940/190209-20031210-4476883-s/" , #2 >> "20031210021940/190209-20031210-4476883-s/0" , #3 >> "20031210022059/190209-20031210-4476885-s/" , #4 >> "20031210022059/190209-20031210-4476885-s/0" , #5 >> # <--Problem_around_here. >> "20031210022154/190209-20031210-4476888-s/" , #6 >> "20031210022154/190209-20031210-4476888-s/0" #7 >> ] >> >> reversed = False >> if len(sys.argv) > 1: >> # use reversed order if "-r" option is given. "-f" or others for >> forward order, no option will list all column names. >> reversed = (sys.argv[1] == '-r') >> >> start_date = columns[0] >> end_date = columns[7] + "z" # add "z" to make problem. >> >> if reversed: >> temp = start_date >> start_date = end_date >> end_date = temp >> pass >> else: >> start_date = end_date = '' >> pass >> >> print "start_date =", start_date, "end_date =", end_date, "reversed = >> ", reversed >> >> for it in cf.get_range(start = A_KEY, finish = A_KEY, >> column_reversed=reversed, column_count=10000, column_start=start_date, >> column_finish=end_date): >> >> for d in it[1].iteritems(): >> print "col='%s', len = %d" % (d[0], len(d[0])) >> pass >> pass >> >> ------------------------- >> >> >> Regards, >> Shotaro >> >> >> >> >> On Fri, Feb 18, 2011 at 5:19 AM, Aaron Morton <aa...@thelastpickle.com> >> wrote: >> > First some terminology, when you say range slice do you mean getting >> > multiple rows? Or do you mean get_slice where you return multiple super >> > columns from one row? >> > >> > Your examples looks like you want to get multiple super columns from one >> > row. In which case the choice of partitioner is not important. The >> > comparator and sub comparator as specified in the CF definition control the >> > ordering of colums. If possible i would suggest using the random >> > partitioner. >> > >> > Could you provide examples of how you are doing the queries using >> > pycassa we may be able to help. >> > >> > My initial guess is that the ranges you specify for the query are not >> > correct when using ASCII ordering for column names, e,g, >> > >> > 20031210 < 20031210022059/190209-20031210-4476885-s/z is true >> > >> > 20031210022059/190209-20031210-4476885-s/z < 20031210 is not true >> > >> > Trying appending the highest value ASCII character to the end of >> > 20031210 >> > >> > Cheers >> > Aaron >> > >> > On 18/02/2011, at 4:35 AM, Shotaro Kamio <kamios...@gmail.com> wrote: >> > >> >> Hi, >> >> >> >> We are in trouble with a strange behavior in cassandra 0.7.2 (also >> >> happened in 0.7.0). Could someone help us? >> >> >> >> The problem happens on a column family of super column type named >> >> "Order". >> >> Data structure is something like: >> >> Order[ a_key ][ date + "/" + order_id + "/" (+ suffix) ][attribute] = >> >> value >> >> >> >> For example, >> >> Order[ "100" ][ "20031210022059/190209-20031210-4476885-s/" ] >> >> is a super column. >> >> Because we want to scan them in the latest-first order, range slice >> >> query with reversed order is used. (Partitioner is >> >> ByteOrderedPartitioner). >> >> >> >> In some supercolumns in my cassandra instance, reversed query returns >> >> no result while it should have results. >> >> For instance, >> >> >> >> * Range slice in normal (lexical)-order ( Order[ "100" ] [ from >> >> "20031210" to "20031210022059/190209-20031210-4476885-s/z" ] ) will >> >> return results correctly. >> >> >> >> col='20031210014347/190209-20031210-4476668-s/' >> >> col='20031210014347/190209-20031210-4476668-s/0' >> >> col='20031210022059/190209-20031210-4476885-s/' >> >> col='20031210022059/190209-20031210-4476885-s/0' >> >> >> >> * Range slice in reversed (latest-first)-order ( Order[ "100" ] [ from >> >> "20031210022059/190209-20031210-4476885-s/z" to "20031210" ] ) will >> >> return NO result! >> >> >> >> Note that the super column name >> >> "20031210022059/190209-20031210-4476885-s/z" doesn't exist. The query >> >> should work. And, it succeeds in other super columns. >> >> >> >> * Range slice in reversed (latest-first)-order starting from existing >> >> column name ( Order[ "100" ] [ from >> >> "20031210022059/190209-20031210-4476885-s/0" to "20031210" ] ) will >> >> return results which should return. >> >> >> >> Both pycassa and hector show the same behavior on the same column >> >> name. I guess that cassandra has some logical error. >> >> >> >> >> >> I'll appreciate any help. >> >> >> >> >> >> Best reagards, >> >> Shotaro >> > >> >> >> >> -- >> Shotaro Kamio > > > > -- > Tyler Hobbs > Software Engineer, DataStax > Maintainer of the pycassa Cassandra Python client library > > -- Shotaro Kamio