Aggregate queries (like count(*) ) are fine *within* a reasonably sized partition (under 100 MB in size). However, Cassandra is not the right tool if you want to do aggregate queries *across* partitions (unless you break up the work with something like Spark). Choosing the right partition key and values IS the goal of Cassandra data modeling. (Clustering keys are used for ordering data within a partition.)
Good: Select count(*) from my_table where my_partition_key = ‘1’; --and the partition is 100 MB or less Not good: Select count(*) from my_table; Are the counts the actual workload or just a measure of the completion of the load? You want to model the data to satisfy the queries for the workload. Queries should be very simple; getting the data model right is the hard work. Make sure that Cassandra fits the use case you have. Sean R. Durity DB Solutions Staff Systems Engineer – Cassandra From: Karthik K <mailidofkarthike...@gmail.com> Sent: Wednesday, September 28, 2022 8:48 AM To: user@cassandra.apache.org Cc: rsesha...@altimetrik.com Subject: [EXTERNAL] Re: Questions on the count and multiple index behaviour in cassandra Hi Stéphane Alleaume, Thanks for your quick response. I have attached the Table stats by running the nodetool cfstats command to get to the size. If I am correct, the partition size must be 464 Mb. However, when I exported the data as csv the ZjQcmQRYFpfptBannerStart This Message Is From an Untrusted Sender You have not previously corresponded with this sender. Report Suspicious <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/M-nmYVHPHQ!GBuqOYqUdJskCVZpWpnorse_9wgMi29zj7R6P8-PPCNyOOWTjk-b2FIZI0VWXRdm0QtXilc7Rlklb2v4lumMt1R2Wj3x082TS8QFuvu5axHnPiPMC7WyoFT1rA7JbZmLqxU$> ZjQcmQRYFpfptBannerEnd Hi Stéphane Alleaume, Thanks for your quick response. I have attached the Table stats by running the nodetool cfstats command to get to the size. If I am correct, the partition size must be 464 Mb. However, when I exported the data as csv the size was 1510 Mb. 1) If we segment this 464Mb data into more partitions, say, with each partition sizing <100Mb, Will the count(*) query work effectively? 2) What will be the approximate response in seconds if we run a select count(*) against 1 million records? 3) Though, Elasticsearch is a future option, we want to dig more with cassandra to achieve this. Do we have any work around using data modelling ? Thanks & Regards, Karthikeyan K On Wed, Sep 28, 2022 at 5:31 PM Stéphane Alleaume <crystallo...@gmail.com<mailto:crystallo...@gmail.com>> wrote: Hi 1) how much size in Mo is your partition ? Should be less than 100 Mo (but less in fact) 2) could you plug an Elasticsearch or Solr search in front ? Kind regards Stephane Le mer. 28 sept. 2022, 13:46, Karthik K <mailidofkarthike...@gmail.com<mailto:mailidofkarthike...@gmail.com>> a écrit : Hi, We have two doubts on cassandra 3.11 features: 1) Need to get counts of row from a cassandra table. We have 3 node clusters with Apache Cassandra 3.11 version. We loaded a table in cassandra with 9lakh records. We have around 91 columns in this table. Most of the records have text as datatype. All these 9lakh records were part of a single partition key. When we tried a select count(*) query with that partition key, the query was timing out. However, we were able to retrieve counts through multiple calls by fetching only 1 lakh records in each call. The only disadvantage here is the time taken which is around 1minute and 3 seconds. Is there any other approach to get the row count faster in cassandra? Do we need to ' change the data modelling approach to achieve this? Suggestions are welcome 2) How to data model in cassandra to support usage of multiple filters. We may also need the count of rows for this multiple filter query. Thanks & Regards, Karthikeyan INTERNAL USE ________________________________ The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.