Are there specific queries that are slow? Partition-key queries should have read latencies in the single digits of ms (or faster). If that is not what you are seeing, I would first review the data model and queries to make sure that the data is modeled properly for Cassandra. Without metrics, I would start at 16-20 GB of RAM for Cassandra on each node (or 31 GB if you can get 64 GB per host).
Since these are VMs, is there any chance they are competing for resources on the same physical host? In my (limited) VM experience, VMs can be 10x slower than physical hosts with local SSDs. (They don't have to be slower, but it can be harder to get visibility to the actual bottlenecks.) I would also look to see what consistency level is being used with the queries. In most cases LOCAL_QUORUM or LOCAL_ONE is preferred. Does the app use prepared statements that are only prepared once per app invocation? Any LWT/"if exists" in your code? Sean Durity From: Attila Wind <attilaw@swf.technology> Sent: Friday, March 5, 2021 9:48 AM To: user@cassandra.apache.org Subject: [EXTERNAL] underutilized servers Hi guys, I have a DevOps related question - hope someone here could give some ideas/pointers... We are running a 3 nodes Cassandra cluster Recently we realized we do have performance issues. And based on investigation we took it seems our bottleneck is the Cassandra cluster. The application layer is waiting a lot for Cassandra ops. So queries are running slow on Cassandra side however due to our monitoring it looks the Cassandra servers still have lots of free resources... The Cassandra machines are virtual machines (we do own the physical hosts too) built with kvm - with 6 CPU cores (3 physical) and 32GB RAM dedicated to it. We are using Ubuntu Linux 18.04 distro - everywhere the same version (the physical and virtual host) We are running Cassandra 4.0-alpha4 What we see is * CPU load is around 20-25% - so we have lots of spare capacity * iowait is around 2-5% - so disk bandwidth should be fine * network load is around 50% of the full available bandwidth * loadavg is max around 4 - 4.5 but typically around 3 (because of the cpu count 6 should represent 100% load) and still, query performance is slow ... and we do not understand what could hold Cassandra back to fully utilize the server resources... We are clearly missing something! Anyone any idea / tip? thanks! -- Attila Wind http://www.linkedin.com/in/attilaw [linkedin.com]<https://urldefense.com/v3/__http:/www.linkedin.com/in/attilaw__;!!M-nmYVHPHQ!bV6Y2yInjIblpSxfYKYMiA824aLtBpQOoMG9YxMiFFqAvGsnmu9WObBWHS6rFDGp-DVnAQ8$> Mobile: +49 176 43556932 ________________________________ The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.