Thank you very much , one question . you wrote that I do not need distinct here since it's a part from the primary key. but only the combination is unique (*PRIMARY KEY (id, timestamp) ) .* also if I take the last token and feed it back as you showed wouldn't I get overlapping boundaries ?
On Sun, Aug 20, 2017 at 6:18 PM, Eric Stevens <migh...@gmail.com> wrote: > You should be able to fairly efficiently iterate all the partition keys > like: > > select id, token(id) from table where token(id) >= -9204925292781066255 > limit 1000; > id | system.token(id) > --------------------------------------------+---------------------- > ... > 0xb90ea1db5c29f2f6d435426dccf77cca6320fac9 | -7821793584824523686 > > Take the last token you receive and feed it back in, skipping duplicates > from the previous page (on the unlikely chance that you have two ID's with > a token collision on the page boundary): > > select id, token(id) from table where token(id) >= > -7821793584824523686 limit 1000; > id | system.token(id) > --------------------------------------------+--------------------- > ... > 0xc6289d729c9087fb5a1fe624b0b883ab82a9bffe | -434806781044590339 > > Continue until you have no more results. You don't really need distinct > here: it's part of your primary key, it must already be distinct. > > If you want to parallelize it, split the ring into *n* ranges and include > it as an upper bound for each segment. > > select id, token(id) from table where token(id) >= -9204925292781066255 > AND token(id) < $rangeUpperBound limit 1000; > > > On Sun, Aug 20, 2017 at 12:33 AM Avi Levi <a...@indeni.com> wrote: > >> I need to get all unique keys (not the complete primary key, just the >> partition key) in order to aggregate all the relevant records of that key >> and apply some calculations on it. >> >> *CREATE TABLE my_table ( >> >> id text, >> >> timestamp bigint, >> >> value double, >> >> PRIMARY KEY (id, timestamp) )* >> >> I know that to query like this >> >> *SELECT DISTINCT id FROM my_table * >> >> is not very efficient but how about the approach presented here >> <http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/> >> sending queries in parallel and using the token >> >> *SELECT DISTINCT id FROM my_table WHERE token(id) >= -9204925292781066255 >> AND token(id) <= -9223372036854775808; * >> >> *or I can just maintain another table with the unique keys * >> >> *CREATE TABLE id_only ( id text, >> >> PRIMARY KEY (id) )* >> >> but I tend not to since it is error prone and will enforce other procedures >> to maintain data integrity between those two tables . >> >> any ideas ? >> >> Thanks >> >> Avi >> >>