Thanks for your quick and detailed explain on the key scan. This is really
helpful!

 

Dop

 

From: Philip Stanhope [mailto:pstanh...@wimba.com] 
Sent: Thursday, June 10, 2010 10:40 PM
To: user@cassandra.apache.org
Subject: Re: keyrange for get_range_slices

 

No ... and I personally don't have a problem with this if you think about
what is actually going on under the covers.

 

Note, however, that this is an expensive operation and as a result if there
are parallel updates to the indexes while you are performing a full keyscan
(rowscan) you will potentially miss keys because they are inserted earlier
in the index than you are currently processing.

 

A further concern is that the keys (and indexes) are spread around a
cluster. Unless R=N you will be hitting the network during this type of
scan.

 

Lastly, be careful about how you specify the SlicePredicate. A keyscan can
easily turn into a "dump the entire datastore" if you aren't careful.

 

On Jun 10, 2010, at 10:03 AM, Dop Sun wrote:





Hi,

 

As documented in the http://wiki.apache.org/cassandra/API, the key range for
get_range_slices are both inclusive.

 

As discussed in this thread:
http://groups.google.com/group/jassandra-user/browse_thread/thread/c2e56453c
de067d3, there is a case that user want to discover all keys (huge number)
in a column family.

 

What I think  is doing batchly: using empty string as start and finish
first, then using the last key returned as start and query second.

 

My question is: using this method, the last key returned for the first
query, will be returned again in the second query as the first key. And it's
a duplication. Is there any other API to discover keys without duplications
in current implementation?

 

Thanks,

Regards,

Dop

 

Reply via email to