Hi all,

we are thinking of how to best proceed with availability testing of
Cassandra nodes. It is becoming more and more apparent that it is rather
complex task. We thought that we should try to read and write to each
cassandra node to "monitoring" keyspace with a unique value with low
TTL. This helps to find an issue but it also triggers flapping of
unaffected hosts, as the key of the value which is beining inserted
sometimes belongs to an affected host and sometimes not. Now, we could
calculate the right value to insert so we can be sure it will hit the
host we are connecting to, but then, you have replication factor and
consistency level, so you can not be really sure that it actually tests
ability of the given host to write values.

So we ended up thinking that the best approach is to connect to each
individual host, read some system keyspace (which might be on a
different disk drive...), which should be local, and then check several
JMX values that could indicate an error + JVM statitics (full heap, gc
overhead). Moreover, we will more monitor our applications that are
using cassandra (with mostly datastax driver) and try to get fail node
information from them.

How others do the testing?

Jirka H.

Reply via email to