Re: node won't leave

Reverend Chip Sat, 06 Nov 2010 14:52:39 -0700

On 11/6/2010 1:48 PM, Jonathan Ellis wrote:
> On Fri, Nov 5, 2010 at 8:03 PM, Chip Salzenberg <rev.c...@gmail.com> wrote:
>> In the below "nodetool ring" output, machine 18 was told to loadbalance over
>> an hour ago.  It won't actually leave the ring.  When I first told it to
>> loadbalance, the cluster was under heavy write load; I've turned off the
>> write load, but the node won't actually leave, still.  Help?
> What version is the cluster on?


You mean, the Cassandra version?  0.7 beta3.

>   Did any of the nodes log any dropped messages?

I didn't keep timestamps of the maintenance steps, so I will be unable
to be sure which log entries correspond to which failure states.  I did
find dropped message log entries on node X.22, though.  Here's the batch
that happened more or less the time things went wrong:

 WARN [ScheduledTasks:1] 2010-11-05 17:15:03,294 MessagingService.java
(line 515) Dropped 9122 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:05,434 MessagingService.java
(line 515) Dropped 16658 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:07,084 MessagingService.java
(line 515) Dropped 2167 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:09,371 MessagingService.java
(line 515) Dropped 28011 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:11,111 MessagingService.java
(line 515) Dropped 1139 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:13,330 MessagingService.java
(line 515) Dropped 1203 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:15,241 MessagingService.java
(line 515) Dropped 4494 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:16,925 MessagingService.java
(line 515) Dropped 2277 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:18,839 MessagingService.java
(line 515) Dropped 17376 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:23,385 MessagingService.java
(line 515) Dropped 18714 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:25,261 MessagingService.java
(line 515) Dropped 18952 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:29,006 MessagingService.java
(line 515) Dropped 25137 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:30,859 MessagingService.java
(line 515) Dropped 1 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:34,418 MessagingService.java
(line 515) Dropped 2580 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:35,816 MessagingService.java
(line 515) Dropped 4317 messages in the last 1000ms

I looked for similar messages on node X.21 but didn't find any.

It seems that node states can become weird or wedged -- bordering on
internally inconsistent -- and cleanup operations on the order of
"shutdown the node manually and force-remove it from the ring" are
commonplace.  I hope I'm missing something.  Am I to understand that
ring maintenance requests can just fail when partially complete, in the
same manner as a regular insert might fail, perhaps due to inter-node
RPC overflow?

> Any other error or warning messages?

"Cannot provide an optimal BloomFilter" several times, and "Schema
definitions were defined both locally and in cassandra.yaml" on startup.

>> (It also collected 3.6G of load even though automatic bootstrapping is
>> disabled -- but this node had belonged to the cluster before, so maybe
>> cleaning out /var/lib/cassandra/* wasn't enough to prevent the node from
>> rejoining and taking data responsibility?)
> Assuming that contains both commitlog and data directories, that
> should do it.  You can tell by what it logs when it first starts up,
> if it's asking other nodes to send it data.

It would appear, then, that Cassandra isn't designed to be operated and
understood without constant log watching of all nodes.

Re: node won't leave

Reply via email to