Hi Jens.

Thanks for your response but my idea is to count different keys, so, if I
understood correctly selecting WHERE key = #{key} won't give me any new
key, right?

Thanks!

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 25 April 2016 at 09:22, Jens Rantil <jens.ran...@tink.se> wrote:

> Hi Carlos,
>
> In CQL, for the cornercase you describe, you could simply do
>
>     SELECT * FROM tbl WHERE key=#{key} LIMIT 1000;
>
> and if it returns 1000 items, you'd iteratively do
>
>     SELECT * FROM tbl WHERE key=#{key} AND column1 >
> #{last_col1_in_prev_query} LIMIT 1000;
>
> Also, have a look at fetchSize here:
> https://docs.datastax.com/en/developer/java-driver/2.0/java-driver/reference/queryBuilderOverview.html?scroll=queryBuilderOverview__setting-query-options-querybuilder-api
>
> Hope this helps.
>
> Cheers,
> Jens
>
> On Thu, Apr 21, 2016 at 5:59 PM Carlos Alonso <i...@mrcalonso.com> wrote:
>
>> Hi guys.
>>
>> I've been struggling for the last days to find a reliable and stable way
>> to count keys in a thrift column family.
>>
>> My idea is to basically iterate the whole ring using the token function,
>> as documented here:
>> https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html in
>> batches of 10000 records
>>
>> The only corner case is that if there were more than 10000 records in a
>> single partition (not the case, but the program should still handle it) it
>> explores the partition in depth by getting all records for that particular
>> token (see below). In the end, all keys are saved into a hash to guarantee
>> uniqueness. The count of unique keys is always different (and random,
>> sometimes more keys, sometimes less are retrieved) and, of course, I'm sure
>> no activity is going on in that cf.
>>
>> I'm running Cassandra 2.1.11 with MurMur3 partitioner. RF=3 and CL=QUORUM
>>
>> the column family structure is
>>
>> CREATE TABLE tbl (
>>     key blob,
>>     column1 ascii,
>>     value blob,
>>     PRIMARY KEY(key, column1)
>> )
>>
>> and I'm running the following script
>>
>> connection = open_cql_connection
>> results = connection.execute("SELECT token(key), key FROM tbl LIMIT
>> 10000")
>>
>> keys_hash = {} // Hash to save the keys to guarantee uniqueness
>> last_token = nil
>> token = nil
>>
>> while results != nil
>>   results.each do |row|
>>     keys_hash[row['key']] = true
>>     token = row['token(key)']
>>   end
>>   if token == last_token
>>     results = connection.execute("SELECT token(key), key FROM tbl WHERE
>> token(key) = #{token}")
>>   else
>>     results = connection.execute("SELECT token(key), key FROM tbl WHERE
>> token(key) >= #{token} LIMIT 10000")
>>   end
>>   last_token = token
>> end
>>
>> puts keys.keys.count
>>
>> What am I missing?
>>
>> Thanks!
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>

Reply via email to