I took this approach... reject the first result of subsequent get_range_slice requests. If you look back at output I posted (below) you'll notice that not all of the 30 keys [key1...key30] get listed! The iteration dies and can't proceed past key2.
1) 1st batch gets 10 unique keys. 2) 2nd batch only gets 9 unique keys with the 1st being a repeat 3) 3rd batch only get 2 unqiue keys "" That means the iteration didn't see 9 keys in the CF. Key7 and Key30 are missing for example. [junit] Query w/ Range(,,10) result size: 10 [junit] key18 [junit] key23 [junit] key26 [junit] key27 [junit] key12 [junit] key28 [junit] key4 [junit] key3 [junit] key1 [junit] key24 [junit] Query w/ Range(key24,,10) result size: 10 [junit] key24 [junit] key5 [junit] key17 [junit] key29 [junit] key19 [junit] key8 [junit] key15 [junit] key22 [junit] key6 [junit] key25 [junit] Query w/ Range(key25,,10) result size: 3 [junit] key25 [junit] key14 [junit] key2 [junit] Query w/ Range(key2,,10), result size: 1 [junit] key2 -Adam -----Original Message----- From: sc...@scode.org on behalf of Peter Schuller Sent: Fri 8/6/2010 6:43 PM To: user@cassandra.apache.org Subject: Re: error using get_range_slice with random partitioner > I think this is actually the expected result, whenever you are using > range_slices with start_key/end_key you must increment the last key > you received and then use that in the next slice start_key. I also > tried to use token because of exactly that behaviour and the doc > talking about inclusive/exclusive. Another way to do it is to filter results to exclude columns received twice due to being on iteration end points. This is useful because it is not always possible to increment or decrement (depending on iteration order) a column name (for example, in the case of byte strings, because there is no defined maximum possible length so the lexicographically "previous" column name might be infinitely long). -- / Peter Schuller
<<winmail.dat>>