Re: Data read size from table

Bowen Song Sat, 11 Sep 2021 12:14:14 -0700

Hello Ashish,

I don't think Cassandra exposes any metrics like that via the JMXinterface (which is where the Prometheus JMX exporter is getting themetrics from). However, you do have a few other options to achieve thesame goal, such as request tracing (nodetool settraceprobability), slowlog (slow_query_log_timeout_in_ms in the cassandra.yaml, but be mindfulthat it can be misleading, as all queries during a log STW GC pause willbe logged as slow) and the new full query log(full_query_logging_options in the cassandra.yaml) feature in Cassandra 4.

But, to be hones, I don't think a single read on one 80MB partition willcause any GC issue at all. It's more likely to be a range query (e.g.:SELECT ... FROM ... WHERE TOKEN(pk) > m AND TOKEN(pk) < n), repeatedlyread the same partition in a very short period of time (e.g.: bad retrypolicy, or hot partitions), very bursty requests (and the peak hasexceeded the node's capacity), or a large number of tombstones (checkthe logs). Or more often, a combination of those.

I'm pretty interested to find out how do you know it's the table 'x'responsible for the long GC pauses? Have you got some concrete evidenceabout it? If you aren't sure about that, you may want to keep your mindopen as the root cause could be something, such as repair sessions(merkle tree size) and hinted handoff (malformed writes can stuck inhinted handoff and gets retried repeatedly until it expires, at leastthis was true in early 3.x versions).

If you want to dig deep into the root cause, I personally like theapproach to take and analyse a snapshot of the JVM heap during the longGC pause if the pause is long enough (a few seconds should besufficient). You can write a script to read the GC log file and take aheap dump when the JVM is in a long STW GC, and you will then be able tosee what exactly is in the heap when it happens. I find the heap dumpoften gives me very useful insight about the accurate and exact cause ofthe long GC pause. Additionally and alternatively, if you are using ZFSfor the Cassandra data directory like I do, the ZFS debug log can giveyou a lot more insight about what exactly had Cassandra readfrom/written to the filesystem, accurate to the specific filename andoffset, and from there you will be able to reconstruct what had happenedto Cassandra right before a long GC pause.


Happy GC issue hunting and perhaps GC tuning too :-)


Cheers,

Bowen


On 11/09/2021 16:39, MyWorld wrote:

Hi all,
We are using Prometheus + grafana for monitoring apache cassandra withscrape interval of 15s. We have a table 'x' with partition sizevarying from 2mb to 80mb.We know there are few big partition entries present in this table andmy objective is to monitor when this big partition entry is read fromCassandra(as it can be a cause of large GC pause)Now in Prometheus how can I figure out the "size of total data read"from table 'x' in last 15s. What formula can be applied?
Regards,
Ashish

Re: Data read size from table

Reply via email to