Thanks for all the response, I have included requested information below. As a side note, I THINK I have fixed the problem by using "disk_access_mode: mmap_index_only". At the very least, none of the nodes has died since setting the option.
> What kernel version are you running? 3/4 running 2.6.32-21-server, 1/4 running 2.6.35-23-server > Also, you're virtualized (given %steal), right? No, %steal is 0, these are all dedicated machines > What filesystem are you using? EXT4 > Are there any clues in /var/log/messages? Nothing out of the ordinary > How much swap space do you have configured? 2 GB and 24 GB of system memory. Dan From: Chris Goffinet [mailto:c...@chrisgoffinet.com] Sent: December-20-10 17:32 To: user@cassandra.apache.org Subject: Re: Severe Reliability Problems - 0.7 RC2 What kernel version are you running? I have seen with I/O intense nodes with 2.6.18 to 2.6.24 the kernel has a bug where it locks the JVM and spins to 100%. On Mon, Dec 20, 2010 at 1:14 PM, Brandon Williams <dri...@gmail.com> wrote: On Mon, Dec 20, 2010 at 2:13 PM, Dan Hendry <dan.hendry.j...@gmail.com> wrote: Yes, I have tried that (although only twice). Same impact as a regular kill: nothing happens and I get no stacktrace output. It is however on my list of things to try again the next time a node dies. I am also not able to attach jstack to the process. Kill -3 will only produce output in foreground mode, jstack will work in either foreground or background. -Brandon No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.872 / Virus Database: 271.1.1/3327 - Release Date: 12/20/10 02:34:00