RE: keyrange for get_range_slices

Dop Sun Thu, 10 Jun 2010 16:19:38 -0700

Thanks for your quick and detailed explain on the key scan. This is really
helpful!

Dop

From: Philip Stanhope [mailto:pstanh...@wimba.com] 
Sent: Thursday, June 10, 2010 10:40 PM
To: user@cassandra.apache.org
Subject: Re: keyrange for get_range_slices

No ... and I personally don't have a problem with this if you think about
what is actually going on under the covers.

Note, however, that this is an expensive operation and as a result if there
are parallel updates to the indexes while you are performing a full keyscan
(rowscan) you will potentially miss keys because they are inserted earlier
in the index than you are currently processing.

A further concern is that the keys (and indexes) are spread around a
cluster. Unless R=N you will be hitting the network during this type of
scan.

Lastly, be careful about how you specify the SlicePredicate. A keyscan can
easily turn into a "dump the entire datastore" if you aren't careful.

On Jun 10, 2010, at 10:03 AM, Dop Sun wrote:

Hi,

As documented in the http://wiki.apache.org/cassandra/API, the key range for
get_range_slices are both inclusive.

As discussed in this thread:
http://groups.google.com/group/jassandra-user/browse_thread/thread/c2e56453c
de067d3, there is a case that user want to discover all keys (huge number)
in a column family.

What I think  is doing batchly: using empty string as start and finish
first, then using the last key returned as start and query second.

My question is: using this method, the last key returned for the first
query, will be returned again in the second query as the first key. And it's
a duplication. Is there any other API to discover keys without duplications
in current implementation?

Thanks,

Regards,

Dop

RE: keyrange for get_range_slices

Reply via email to