Cassandra is not good for table scan type queries (which count(*) typically 
is). While there are some attempts to do that (as noted below), this is a path 
I avoid.


Sean Durity

From: Max C [mailto:mc_cassan...@core43.com]
Sent: Saturday, April 09, 2016 6:19 PM
To: user@cassandra.apache.org
Subject: Re: 1, 2, 3...

Looks like this guy (Brian Hess) wrote a script to split the token range and 
run count(*) on each subrange:

https://github.com/brianmhess/cassandra-count

- Max

On Apr 8, 2016, at 10:56 pm, Jeff Jirsa 
<jeff.ji...@crowdstrike.com<mailto:jeff.ji...@crowdstrike.com>> wrote:

SELECT COUNT(*) probably works (with internal paging) on many datasets with 
enough time and assuming you don’t have any partitions that will kill you.

No, it doesn’t count extra replicas / duplicates.

The old way to do this (before paging / fetch size) was to use manual paging 
based on tokens/clustering keys:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html – SELECT’s 
WHERE clause can use token(), which is what you’d want to use to page through 
the whole token space.

You could, in theory, issue thousands of queries in parallel, all for different 
token ranges, and then sum the results. That’s what something like spark would 
be doing. If you want to determine rows per node, limit the token range to that 
owned by the node (easier with 1 token than vnodes, with vnodes repeat 
num_tokens times).


________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Reply via email to