Hi Carlos, In CQL, for the cornercase you describe, you could simply do
SELECT * FROM tbl WHERE key=#{key} LIMIT 1000; and if it returns 1000 items, you'd iteratively do SELECT * FROM tbl WHERE key=#{key} AND column1 > #{last_col1_in_prev_query} LIMIT 1000; Also, have a look at fetchSize here: https://docs.datastax.com/en/developer/java-driver/2.0/java-driver/reference/queryBuilderOverview.html?scroll=queryBuilderOverview__setting-query-options-querybuilder-api Hope this helps. Cheers, Jens On Thu, Apr 21, 2016 at 5:59 PM Carlos Alonso <i...@mrcalonso.com> wrote: > Hi guys. > > I've been struggling for the last days to find a reliable and stable way > to count keys in a thrift column family. > > My idea is to basically iterate the whole ring using the token function, > as documented here: > https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html in > batches of 10000 records > > The only corner case is that if there were more than 10000 records in a > single partition (not the case, but the program should still handle it) it > explores the partition in depth by getting all records for that particular > token (see below). In the end, all keys are saved into a hash to guarantee > uniqueness. The count of unique keys is always different (and random, > sometimes more keys, sometimes less are retrieved) and, of course, I'm sure > no activity is going on in that cf. > > I'm running Cassandra 2.1.11 with MurMur3 partitioner. RF=3 and CL=QUORUM > > the column family structure is > > CREATE TABLE tbl ( > key blob, > column1 ascii, > value blob, > PRIMARY KEY(key, column1) > ) > > and I'm running the following script > > connection = open_cql_connection > results = connection.execute("SELECT token(key), key FROM tbl LIMIT 10000") > > keys_hash = {} // Hash to save the keys to guarantee uniqueness > last_token = nil > token = nil > > while results != nil > results.each do |row| > keys_hash[row['key']] = true > token = row['token(key)'] > end > if token == last_token > results = connection.execute("SELECT token(key), key FROM tbl WHERE > token(key) = #{token}") > else > results = connection.execute("SELECT token(key), key FROM tbl WHERE > token(key) >= #{token} LIMIT 10000") > end > last_token = token > end > > puts keys.keys.count > > What am I missing? > > Thanks! > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.