Hello, Since Sunday, we've been experiencing a really odd issue in our Cassandra cluster. We recently started receiving errors that messages are being dropped. But here is the odd part...
When looking in the AWS console, instead of seeing statistics being elevated during this time, we actually see all statistics suddenly drop right before these messages appear. CPU, I/O, and network go way down. In fact, in one case, they went to 0 for about 5 minutes to the point that other cassandra nodes saw this specific node in question as being down. The messages appear right after the node "wakes up". We've had this happen on 3 different nodes on three different days since Sunday. Other facts: - We recently upgraded from m1.large to m1.xlarge instances about two weeks ago. - We are running Cassandra 1.1.9 - We've been doing some memory tuning, although I have seen this happen on untuned nodes. Has anyone seen anything like this before? Another related question. Once we see messages being dropped on one node, our cassandra client appears to see this, reporting errors. We use LOCAL_QUORUM with a RF of 3 on all queries. Any idea why clients would see an error? If only one node reports an error, shouldn't the consistency level prevent the client from seeing an issue? Thanks for your help, -Mike