Hello Ashish,

I don't think Cassandra exposes any metrics like that via the JMX interface (which is where the Prometheus JMX exporter is getting the metrics from). However, you do have a few other options to achieve the same goal, such as request tracing (nodetool settraceprobability), slow log (slow_query_log_timeout_in_ms in the cassandra.yaml, but be mindful that it can be misleading, as all queries during a log STW GC pause will be logged as slow) and the new full query log (full_query_logging_options in the cassandra.yaml) feature in Cassandra 4.

But, to be hones, I don't think a single read on one 80MB partition will cause any GC issue at all. It's more likely to be a range query (e.g.: SELECT ... FROM ... WHERE TOKEN(pk) > m AND TOKEN(pk) < n), repeatedly read the same partition in a very short period of time (e.g.: bad retry policy, or hot partitions), very bursty requests (and the peak has exceeded the node's capacity), or a large number of tombstones (check the logs). Or more often, a combination of those.

I'm pretty interested to find out how do you know it's the table 'x' responsible for the long GC pauses? Have you got some concrete evidence about it? If you aren't sure about that, you may want to keep your mind open as the root cause could be something, such as repair sessions (merkle tree size) and hinted handoff (malformed writes can stuck in hinted handoff and gets retried repeatedly until it expires, at least this was true in early 3.x versions).

If you want to dig deep into the root cause, I personally like the approach to take and analyse a snapshot of the JVM heap during the long GC pause if the pause is long enough (a few seconds should be sufficient). You can write a script to read the GC log file and take a heap dump when the JVM is in a long STW GC, and you will then be able to see what exactly is in the heap when it happens. I find the heap dump often gives me very useful insight about the accurate and exact cause of the long GC pause. Additionally and alternatively, if you are using ZFS for the Cassandra data directory like I do, the ZFS debug log can give you a lot more insight about what exactly had Cassandra read from/written to the filesystem, accurate to the specific filename and offset, and from there you will be able to reconstruct what had happened to Cassandra right before a long GC pause.

Happy GC issue hunting and perhaps GC tuning too :-)


Cheers,

Bowen


On 11/09/2021 16:39, MyWorld wrote:
Hi all,

We are using Prometheus + grafana for monitoring apache cassandra with scrape interval of 15s. We have a table 'x' with partition size varying from 2mb to 80mb. We know there are few big partition entries present in this table and my objective is to monitor when this big partition entry is read from Cassandra(as it can be a cause of large GC pause) Now in Prometheus how can I figure out the "size of total data read" from table 'x' in last 15s. What formula can be applied?

Regards,
Ashish

Reply via email to