Hello,

Since Sunday, we've been experiencing a really odd issue in our Cassandra 
cluster.  We recently started receiving errors that messages are being dropped. 
 But here is the odd part...

When looking in the AWS console, instead of seeing statistics being elevated 
during this time, we actually see all statistics suddenly drop right before 
these messages appear.  CPU, I/O, and network go way down.  In fact, in one 
case, they went to 0 for about 5 minutes to the point that other cassandra 
nodes saw this specific node in question as being down.  The messages appear 
right after the node "wakes up".

We've had this happen on 3 different nodes on three different days since Sunday.

Other facts:

- We recently upgraded from m1.large to m1.xlarge instances about two weeks ago.
- We are running Cassandra 1.1.9
- We've been doing some memory tuning, although I have seen this happen on 
untuned nodes.

Has anyone seen anything like this before?

Another related question.  Once we see messages being dropped on one node, our 
cassandra client appears to see this, reporting errors.  We use LOCAL_QUORUM 
with a RF of 3 on all queries.  Any idea why clients would see an error?  If 
only one node reports an error, shouldn't the consistency level prevent the 
client from seeing an issue?

Thanks for your help,
-Mike

Reply via email to