My two keys that I send in my test program are 0xe695b0e69982e99693 and 0x666f6f, which decodes to "数時間" and "foo" respectively.
So I ran my tests again, I started with a clean 0.6.13, wrote two rows with those two keys, drained, shut down, started 0.7.5, and imported my keyspace. In my test program, when I do multi_get_slice, I send in those two keys, and get back a datastructure that contains the exact same keys, but only the structure under the key 0x666f6f contains any columns. When I do a simple get with the first key, I get a NotFoundException. The second key works fine. Doing get_range_slices, I get back two KeySlices, the keys are the exact same, and both have their columns. If I run sstablekeys on the datafile, it prints out: e695b0e69982e99693 666f6f If I run sstable2json on the datafile, it prints out: { "e695b0e69982e99693": [["00", "01", 1304519723589, false]], "666f6f": [["00", "01", 1304519721274, false]] } After that I re-inserted a row with the first key and then ran my tests again. Now both single gets work fine, multi_get_slice works fine, but get_range_slices return a structure with three keys: 0xe695b0e69982e99693 0xe695b0e69982e99693 0x666f6f I restarted Cassandra to make it flush the commitlog, and my datadirectory now has two data files. When I run sstablekeys on the first one it still prints out: e695b0e69982e99693 666f6f And running it on the second datafile makes it print out: e695b0e69982e99693 After all that, I forced a compaction with nodetool and restarted the server, ending up with a single datafile. When I run sstable2json on that, it prints out: { "e695b0e69982e99693": [["00", "01", 1304519723589, false]], "e695b0e69982e99693": [["00", "02", 1304521931818, false]], "666f6f": [["00", "01", 1304519721274, false]] } So I now have an SSTable with two rows with identical keys, except one of the rows doesn't really work? So, now what? And how did I end up in this state? /Henrik Schröder On Tue, May 3, 2011 at 22:10, aaron morton <aa...@thelastpickle.com> wrote: > Can you provide some details of the data returned from you do the = > get_range() ? It will be interesting to see the raw bytes returned for = > the keys. The likely culprit is a change in the encoding. Can you also = > try to grab the bytes sent for the key when doing the single select that = > fails.=20 > > You can grab these either on the client and/or by turing on the logging = > the DEBUG in conf/log4j-server.properties > > Thanks > Aaron > > On 4 May 2011, at 03:19, Henrik Schröder wrote: > > > The way we solved this problem is that it turned out we had only a few > hundred rows with unicode keys, so we simply extracted them, upgraded to > 0.7, and wrote them back. However, this means that among the rows, there are > a few hundred weird duplicate rows with identical keys. > > > > Is this going to be a problem in the future? Is there a chance that the > good duplicate is cleaned out in favour of the bad duplicate so that we > suddnely lose those rows again? > > > > > > /Henrik Schröder > >