If you are working inside the cassandra code base, take a look at o.a.c.hadoop.ColumnFamilyRecordReader. It reads all the rows in a CF using tokens. I'm not sure that code cares too much about reading a row twice. AFAIK using tokens for is considered an internal feature.
WRT the start key / end key issue, why not take a look at how the pycassa, phpcassa or hector libraries do it? Aaron On 22 Nov 2010, at 22:10, alta...@ceid.upatras.gr wrote: > I am not using any client, I am trying to extend Cassandra with a new API > call so that a _node_ will do that on behalf of clients. Thank you for the > answer, but it doesn't answer my question! > > Alexander > >> Most of the high level clients do this for you. >> >> For example, pycassa and phpcassa both do this by returning an >> iterator from get_range() and breaking it up behind the scenes. >> >> Hector also has something similar, but I think it's in the examples >> section. >> >> What client are you using? >> >> (By the way, beta1 is old and buggy! You should switch to beta3.) >> >> - Tyler >> >> On Fri, Nov 19, 2010 at 8:33 AM, <alta...@ceid.upatras.gr> wrote: >> >>> Hello, >>> >>> I would like one of the cluster's nodes to use get_range_slices() to >>> retrieve the values of a specific column for the entire keyspace. I >>> obviously don't want to do it for the whole keyspace at once, so I'd >>> like >>> to do it in groups of n, which should be configurable. >>> >>> I get the first n values using a KeyRange with the current node's local >>> token as start_token and end_token, which equals the whole keyspace. >>> >>> After that, it makes sense to have a loop, and to use each time a new >>> KeyRange with the largest key returned by the previous iteration as the >>> start_key. However, I don't know what to use as end_key, and Cassandra >>> complains that if one of (start_key, end_key) is not null, the other >>> can't >>> be either. What can I do? >>> >>> Can I use tokens? I read that a KeyRange with tokens is end-inclusive, >>> and >>> can wrap, so I can just give the local node's token as the end_token all >>> the time, so when the traversing reaches that node again, it will know >>> the >>> whole keyspace was traversed. Or are tokens different semantically? >>> >>> I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner. >>> >>> Alexander Altanis >>> >>> >>> >>> >>> >>> >> >