On Sat, Nov 6, 2010 at 4:51 PM, Reverend Chip <rev.c...@gmail.com> wrote: > On 11/6/2010 1:48 PM, Jonathan Ellis wrote: >> Did any of the nodes log any dropped messages? > > I didn't keep timestamps of the maintenance steps, so I will be unable > to be sure which log entries correspond to which failure states. I did > find dropped message log entries on node X.22, though. Here's the batch > that happened more or less the time things went wrong: > > WARN [ScheduledTasks:1] 2010-11-05 17:15:03,294 MessagingService.java > (line 515) Dropped 9122 messages in the last 1000ms
> Am I to understand that > ring maintenance requests can just fail when partially complete, in the > same manner as a regular insert might fail, perhaps due to inter-node > RPC overflow? Yes, in beta3 this can happen. This was fixed in CASSANDRA-1676. > It would appear, then, that Cassandra isn't designed to be operated and > understood without constant log watching of all nodes. Not in beta, it's not. :) (In fact I would recommend running beta nodes at debug log level so when something goes wrong you have a better picture of what happened.) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com