You should be able to fairly efficiently iterate all the partition keys
like:

select id, token(id) from table where token(id) >= -9204925292781066255
limit 1000;
 id                                         | system.token(id)
--------------------------------------------+----------------------
...
 0xb90ea1db5c29f2f6d435426dccf77cca6320fac9 | -7821793584824523686

Take the last token you receive and feed it back in, skipping duplicates
from the previous page (on the unlikely chance that you have two ID's with
a token collision on the page boundary):

select id, token(id) from table where token(id) >=
-7821793584824523686 limit 1000;
 id                                         | system.token(id)
--------------------------------------------+---------------------
...
 0xc6289d729c9087fb5a1fe624b0b883ab82a9bffe | -434806781044590339

Continue until you have no more results.  You don't really need distinct
here: it's part of your primary key, it must already be distinct.

If you want to parallelize it, split the ring into *n* ranges and include
it as an upper bound for each segment.

select id, token(id) from table where token(id) >= -9204925292781066255 AND
token(id) < $rangeUpperBound limit 1000;


On Sun, Aug 20, 2017 at 12:33 AM Avi Levi <a...@indeni.com> wrote:

> I need to get all unique keys (not the complete primary key, just the
> partition key) in order to aggregate all the relevant records of that key
> and apply some calculations on it.
>
> *CREATE TABLE my_table (
>
>     id text,
>
>     timestamp bigint,
>
>     value double,
>
>     PRIMARY KEY (id, timestamp) )*
>
> I know that to query like this
>
> *SELECT DISTINCT id FROM my_table *
>
> is not very efficient but how about the approach presented here 
> <http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/>
>  sending queries in parallel and using the token
>
> *SELECT DISTINCT id FROM my_table WHERE token(id) >= -9204925292781066255 AND 
> token(id) <= -9223372036854775808; *
>
> *or I can just maintain another table with the unique keys *
>
> *CREATE TABLE id_only ( id text,
>
>     PRIMARY KEY (id) )*
>
> but I tend not to since it is error prone and will enforce other procedures 
> to maintain data integrity between those two tables .
>
> any ideas ?
>
> Thanks
>
> Avi
>
>

Reply via email to