I'm running a big test -- ten nodes with 3T disk each.  I'm using
0.7.0rc1.  After some tuning help (thanks Tyler) lots of this is working
as it should.  However a serious event occurred as well -- the server
froze up -- and though mutations were dropped, no error was reported to
the client.  Here's what the log said on host X.19:

 WARN [ScheduledTasks:1] 2010-12-06 14:04:11,125 MessagingService.java
(line 527) Dropped 76 MUTATION messages in the last 5000ms

Meanwhile, on the OTHER nodes, gossip decided the node was not available
for a while:

 INFO [ScheduledTasks:1] 2010-12-06 14:04:02,396 Gossiper.java (line
195) InetAddress /X.19 is now dead.
 INFO [GossipStage:1] 2010-12-06 14:04:06,127 Gossiper.java (line 569)
InetAddress /X.19 is now UP

And despite the fact that I was writing with consistency=ALL, none of my
clients reported any errors on their mutations.

Tyler has this information but I would like to know if anyone has seen
this before, and/or has a diagnosis.

Reply via email to