Based on a recent inquiry and a recent thread of my own, and the coming support for wide rows, I'll focus in on this question that I feel needs better documentation of recommended best practice:
When can the COUNT(*) aggregate row-counting function be used? Accept for relatively small or narrow queries, it seems to have a propensity for timing out. >From what I gather (and without specific testing myself), SELECT COUNT(*) seems to be warranted for use cases where: 1. The table is relatively small, or 2. The number of rows selected by the WHERE clause is relatively small, due to a combination of a relatively small number of partitions and a relatively narrow slice of rows in a partition. One can always approximate a larger SELECT by performing multiple selects, each specifying a limited range of tokens in the WHERE clause. But, even if the number of partitions selected is relatively small, it is possible that the partitions might be wide so that the number of rows is not so small. And with CASSANDRA-11206 <https://issues.apache.org/jira/browse/CASSANDRA-11206>, very large partitions (2 GB) could have a non-trivial number of rows. So, an open question is how small is reasonable for COUNT. Thousands? Tens of thousands? Only Hundreds? Obviously it does depend on your particular data and hardware, but just on a general basis. In any case, does anyone have any specific experiences to share as to what recommendations they would have for the use of COUNT? A companion question is whether COUNT(column_name) has the same limitations and recommendations. It does have to actually fetch the column values as opposed to simply determining the existence of the row, but how consequential that additional processing is, I couldn't say. -- Jack Krupansky