Cassandra is not good for table scan type queries (which count(*) typically is). While there are some attempts to do that (as noted below), this is a path I avoid.
Sean Durity From: Max C [mailto:mc_cassan...@core43.com] Sent: Saturday, April 09, 2016 6:19 PM To: user@cassandra.apache.org Subject: Re: 1, 2, 3... Looks like this guy (Brian Hess) wrote a script to split the token range and run count(*) on each subrange: https://github.com/brianmhess/cassandra-count - Max On Apr 8, 2016, at 10:56 pm, Jeff Jirsa <jeff.ji...@crowdstrike.com<mailto:jeff.ji...@crowdstrike.com>> wrote: SELECT COUNT(*) probably works (with internal paging) on many datasets with enough time and assuming you don’t have any partitions that will kill you. No, it doesn’t count extra replicas / duplicates. The old way to do this (before paging / fetch size) was to use manual paging based on tokens/clustering keys: https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html – SELECT’s WHERE clause can use token(), which is what you’d want to use to page through the whole token space. You could, in theory, issue thousands of queries in parallel, all for different token ranges, and then sum the results. That’s what something like spark would be doing. If you want to determine rows per node, limit the token range to that owned by the node (easier with 1 token than vnodes, with vnodes repeat num_tokens times). ________________________________ The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.