Hi all, we are thinking of how to best proceed with availability testing of Cassandra nodes. It is becoming more and more apparent that it is rather complex task. We thought that we should try to read and write to each cassandra node to "monitoring" keyspace with a unique value with low TTL. This helps to find an issue but it also triggers flapping of unaffected hosts, as the key of the value which is beining inserted sometimes belongs to an affected host and sometimes not. Now, we could calculate the right value to insert so we can be sure it will hit the host we are connecting to, but then, you have replication factor and consistency level, so you can not be really sure that it actually tests ability of the given host to write values.
So we ended up thinking that the best approach is to connect to each individual host, read some system keyspace (which might be on a different disk drive...), which should be local, and then check several JMX values that could indicate an error + JVM statitics (full heap, gc overhead). Moreover, we will more monitor our applications that are using cassandra (with mostly datastax driver) and try to get fail node information from them. How others do the testing? Jirka H.