Looks like this guy (Brian Hess) wrote a script to split the token range and run count(*) on each subrange:
https://github.com/brianmhess/cassandra-count <https://github.com/brianmhess/cassandra-count> - Max > On Apr 8, 2016, at 10:56 pm, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > > SELECT COUNT(*) probably works (with internal paging) on many datasets with > enough time and assuming you don’t have any partitions that will kill you. > > No, it doesn’t count extra replicas / duplicates. > > The old way to do this (before paging / fetch size) was to use manual paging > based on tokens/clustering keys: > > https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html > <https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html> – SELECT’s > WHERE clause can use token(), which is what you’d want to use to page through > the whole token space. > > You could, in theory, issue thousands of queries in parallel, all for > different token ranges, and then sum the results. That’s what something like > spark would be doing. If you want to determine rows per node, limit the token > range to that owned by the node (easier with 1 token than vnodes, with vnodes > repeat num_tokens times).