Thanks for your quick and detailed explain on the key scan. This is really helpful!
Dop From: Philip Stanhope [mailto:pstanh...@wimba.com] Sent: Thursday, June 10, 2010 10:40 PM To: user@cassandra.apache.org Subject: Re: keyrange for get_range_slices No ... and I personally don't have a problem with this if you think about what is actually going on under the covers. Note, however, that this is an expensive operation and as a result if there are parallel updates to the indexes while you are performing a full keyscan (rowscan) you will potentially miss keys because they are inserted earlier in the index than you are currently processing. A further concern is that the keys (and indexes) are spread around a cluster. Unless R=N you will be hitting the network during this type of scan. Lastly, be careful about how you specify the SlicePredicate. A keyscan can easily turn into a "dump the entire datastore" if you aren't careful. On Jun 10, 2010, at 10:03 AM, Dop Sun wrote: Hi, As documented in the http://wiki.apache.org/cassandra/API, the key range for get_range_slices are both inclusive. As discussed in this thread: http://groups.google.com/group/jassandra-user/browse_thread/thread/c2e56453c de067d3, there is a case that user want to discover all keys (huge number) in a column family. What I think is doing batchly: using empty string as start and finish first, then using the last key returned as start and query second. My question is: using this method, the last key returned for the first query, will be returned again in the second query as the first key. And it's a duplication. Is there any other API to discover keys without duplications in current implementation? Thanks, Regards, Dop