Are you running with the default heap settings? what else is running on the boxes?
On Wed, Jun 22, 2011 at 9:06 AM, William Oberman <ober...@civicscience.com>wrote: > I was wondering/I figured that /var/log/kern indicated the OS was killing > java (versus an internal OOM). > > The nodetool repair is interesting. My application never deletes, so I > didn't bother running it. But, if that helps prevent OOMs as well, I'll add > it to the crontab.... > > (plan A is still upgrading to 0.8.0). > > will > > > On Wed, Jun 22, 2011 at 8:53 AM, Sasha Dolgy <sdo...@gmail.com> wrote: > >> Yes ... this is because it was the OS that killed the process, and >> wasn't related to Cassandra "crashing". Reviewing our monitoring, we >> saw that memory utilization was pegged at 100% for days and days >> before it was finally killed because 'apt' was fighting for resource. >> At least, that's as far as I got in my investigation before giving up, >> moving to 0.8.0 and implementing 24hr nodetool repair on each node via >> cronjob....so far ... no problems. >> >> On Wed, Jun 22, 2011 at 2:49 PM, William Oberman >> <ober...@civicscience.com> wrote: >> > Well, I managed to run 50 days before an OOM, so any changes I make will >> > take a while to test ;-) I've seen the GCInspector log lines appear >> > periodically in my logs, but I didn't see a correlation with the crash. >> > I'll read the instructions on how to properly do a rolling upgrade >> today, >> > practice on test, and try that on production first. >> > will >> > > > > -- > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) ober...@civicscience.com > -- http://twitter.com/tjake