Hi Jens. Thanks for your response but my idea is to count different keys, so, if I understood correctly selecting WHERE key = #{key} won't give me any new key, right?
Thanks! Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> On 25 April 2016 at 09:22, Jens Rantil <jens.ran...@tink.se> wrote: > Hi Carlos, > > In CQL, for the cornercase you describe, you could simply do > > SELECT * FROM tbl WHERE key=#{key} LIMIT 1000; > > and if it returns 1000 items, you'd iteratively do > > SELECT * FROM tbl WHERE key=#{key} AND column1 > > #{last_col1_in_prev_query} LIMIT 1000; > > Also, have a look at fetchSize here: > https://docs.datastax.com/en/developer/java-driver/2.0/java-driver/reference/queryBuilderOverview.html?scroll=queryBuilderOverview__setting-query-options-querybuilder-api > > Hope this helps. > > Cheers, > Jens > > On Thu, Apr 21, 2016 at 5:59 PM Carlos Alonso <i...@mrcalonso.com> wrote: > >> Hi guys. >> >> I've been struggling for the last days to find a reliable and stable way >> to count keys in a thrift column family. >> >> My idea is to basically iterate the whole ring using the token function, >> as documented here: >> https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html in >> batches of 10000 records >> >> The only corner case is that if there were more than 10000 records in a >> single partition (not the case, but the program should still handle it) it >> explores the partition in depth by getting all records for that particular >> token (see below). In the end, all keys are saved into a hash to guarantee >> uniqueness. The count of unique keys is always different (and random, >> sometimes more keys, sometimes less are retrieved) and, of course, I'm sure >> no activity is going on in that cf. >> >> I'm running Cassandra 2.1.11 with MurMur3 partitioner. RF=3 and CL=QUORUM >> >> the column family structure is >> >> CREATE TABLE tbl ( >> key blob, >> column1 ascii, >> value blob, >> PRIMARY KEY(key, column1) >> ) >> >> and I'm running the following script >> >> connection = open_cql_connection >> results = connection.execute("SELECT token(key), key FROM tbl LIMIT >> 10000") >> >> keys_hash = {} // Hash to save the keys to guarantee uniqueness >> last_token = nil >> token = nil >> >> while results != nil >> results.each do |row| >> keys_hash[row['key']] = true >> token = row['token(key)'] >> end >> if token == last_token >> results = connection.execute("SELECT token(key), key FROM tbl WHERE >> token(key) = #{token}") >> else >> results = connection.execute("SELECT token(key), key FROM tbl WHERE >> token(key) >= #{token} LIMIT 10000") >> end >> last_token = token >> end >> >> puts keys.keys.count >> >> What am I missing? >> >> Thanks! >> >> Carlos Alonso | Software Engineer | @calonso >> <https://twitter.com/calonso> >> > -- > > Jens Rantil > Backend Developer @ Tink > > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden > For urgent matters you can reach me at +46-708-84 18 32. >