The main diagnosing feature of the problem I was seeing is very high system
CPU with no user CPU utilization(check with top or sar -u), vmstat showing
one process waiting for run-time but never seeming to get it, a high page
scan rate, and no Cassandra error messages (although nodes dying did *seem*
to correlate with flushing memtables and compaction). I am also using 64 bit
kernel.

 

I was having nodes dying every few hours but ever since I switched from mmap
(auto= mmap for 64 bit) to mmap_index_only, things have been rock solid
reliable. No down time in 48+ hours. You haven't really provided enough
information to determine if you are having the same problem I was having but
if you think so, I would recommend you at least try switching to
mmap_index_only. 

 

Can one of the Cassandra devs or anybody who knows about memory mapping
comment on this/my particular mmap situation? I have been thinking about it
and the start of my problems seemed to correlate to my active dataset and
single sstable sizes growing beyond the amount of free system memory (12 GB,
my nodes have 24 GB total with 12 GB for Cassandra heap). Does memory
mapping somehow force the data to stay in memory or prevent it memory from
being reclaimed for other purposes? Google does not turn up any nice simple
answers.

 

Dan

 

From: Christopher Kung [mailto:chris.k...@gmail.com] 
Sent: December-22-10 4:09
To: user@cassandra.apache.org
Subject: Cassandra Node Routinely Goes Down - 0.7 RC2

 

Hey All,

 

I have been having problems running 0.7RC2 where one of my two nodes
routinely goes down. Somtimes both of them go down. I am running the nodes
using Ubuntu Lucid LTS 64-bit with kernal version 2.6.32. Currently, both
nodes are running on micro instances on EC2. I will eventual migrate to
large instance...but I can't seem to get Cassandra to stay up for more than
1 day at a time

 

 I saw another post recently where someone else was having a similiar
problem, and the solution was to change to mmap_index for disk access mode
rather than auto. Anyways, the machines are 64-bit, despite being under
powered, so I don't see why that's necessary. I checked my logs and there
are no error messages. Are the nodes just running into resource issues? 

 

Thanks.

 

Chris

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3329 - Release Date: 12/21/10
02:34:00

Reply via email to