Hi, I've begun experiencing very high tail latencies across my clusters. While Cassandra's internal metrics report <1ms read latencies, measuring responses from within the driver in my applications (roundtrips of query/execute frames), have 90% round trip times of up to a second for very basic queries (SELECT a,b FROM table WHERE pk=x).
I've been studying the logs to try and get a handle on what could be going wrong. I don't think there are GC issues, but the logs mention dropped messages due to timeouts while the threadpools are nearly empty - https://gist.github.com/nemothekid/28b2a8e8353b3e60d7bbf390ed17987c Relevant line: REQUEST_RESPONSE messages were dropped in last 5000 ms: 1 for internal timeout and 0 for cross node timeout. Mean internal dropped latency: 54930 ms and Mean cross-node dropped latency: 0 ms Are there any tools I can use to start to understand what is causing these issues? Nimi