I'm inclined to think there's a bug in your client, then. DEBUG-level logs could confirm or refute this by logging for each insert how many replicas are being blocked for, which nodes it got responses from, and whether a TimedOutException from not getting ALL replies was returned to the client.
On Tue, Dec 7, 2010 at 2:37 PM, Reverend Chip <rev.c...@gmail.com> wrote: > No, I'm afraid that's not it: > replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy > replication_factor: 3 > > On 12/7/2010 6:37 AM, Jonathan Ellis wrote: >> If you are using NetworkTopologyStrategy you are probably hitting >> https://issues.apache.org/jira/browse/CASSANDRA-1804 which is fixed in >> rc2. >> >> On Mon, Dec 6, 2010 at 6:58 PM, Reverend Chip <rev.c...@gmail.com> wrote: >>> I'm running a big test -- ten nodes with 3T disk each. I'm using >>> 0.7.0rc1. After some tuning help (thanks Tyler) lots of this is working >>> as it should. However a serious event occurred as well -- the server >>> froze up -- and though mutations were dropped, no error was reported to >>> the client. Here's what the log said on host X.19: >>> >>> WARN [ScheduledTasks:1] 2010-12-06 14:04:11,125 MessagingService.java >>> (line 527) Dropped 76 MUTATION messages in the last 5000ms >>> >>> Meanwhile, on the OTHER nodes, gossip decided the node was not available >>> for a while: >>> >>> INFO [ScheduledTasks:1] 2010-12-06 14:04:02,396 Gossiper.java (line >>> 195) InetAddress /X.19 is now dead. >>> INFO [GossipStage:1] 2010-12-06 14:04:06,127 Gossiper.java (line 569) >>> InetAddress /X.19 is now UP >>> >>> And despite the fact that I was writing with consistency=ALL, none of my >>> clients reported any errors on their mutations. >>> >>> Tyler has this information but I would like to know if anyone has seen >>> this before, and/or has a diagnosis. >>> >>> >> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com