Agreed, that anything requiring a full table scan, short of batch analytics,is an antipattern, although the goal is not to do a full scan per se, but just get the row count. It still surprises people that Cassandra cannot quickly get COUNT(*). The easy answer: Use DSE Search and do a Solr query for q=*:* and that will very quickly return the total row count. I presume that Stratio will handle this fine as well.
-- Jack Krupansky On Mon, Apr 11, 2016 at 11:10 AM, <sean_r_dur...@homedepot.com> wrote: > Cassandra is not good for table scan type queries (which count(*) > typically is). While there are some attempts to do that (as noted below), > this is a path I avoid. > > > > > > Sean Durity > > > > *From:* Max C [mailto:mc_cassan...@core43.com] > *Sent:* Saturday, April 09, 2016 6:19 PM > *To:* user@cassandra.apache.org > *Subject:* Re: 1, 2, 3... > > > > Looks like this guy (Brian Hess) wrote a script to split the token range > and run count(*) on each subrange: > > > > https://github.com/brianmhess/cassandra-count > > > > - Max > > > > On Apr 8, 2016, at 10:56 pm, Jeff Jirsa <jeff.ji...@crowdstrike.com> > wrote: > > > > SELECT COUNT(*) probably works (with internal paging) on many datasets > with enough time and assuming you don’t have any partitions that will kill > you. > > > > No, it doesn’t count extra replicas / duplicates. > > > > The old way to do this (before paging / fetch size) was to use manual > paging based on tokens/clustering keys: > > > > https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html – > SELECT’s WHERE clause can use token(), which is what you’d want to use to > page through the whole token space. > > > > You could, in theory, issue thousands of queries in parallel, all for > different token ranges, and then sum the results. That’s what something > like spark would be doing. If you want to determine rows per node, limit > the token range to that owned by the node (easier with 1 token than vnodes, > with vnodes repeat num_tokens times). > > > > ------------------------------ > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >