I have a small, two-node cluster running Cassandra 2.2.1. I am seeing a lot
of these messages in both logs:

WARN  07:23:16 Not marking nodes down due to local pause of 7219277694 >
5000000000

I am fairly certain that they are not due to GC. I am not seeing a whole of
GC being logged and nothing over 500 ms. I do think it is I/O related.

I am seeing lots of read timeouts for queries to a table that has a large
growing number of SSTables. At last count there are over 1800 SSTables on
one node. The count is lower on the other node, and I suspect that this is
due to data distribution. Slowly but surely the number of SSTables keeps
going up, and not surprisingly nodetool tablehistograms reports high
latencies. The table is using STCS.

I am seeing some but not a whole lot of dropped mutations. nodetool tpstats
looks ok.

The growing number of SSTables really makes me think this is an I/O issue.
Casssandra is running in a kubernetes cluster using a SAN which is another
reason I suspect I/O.

What are some things I can look at/test to determine what is causing all of
local pauses?

- John

Reply via email to