I was wondering/I figured that /var/log/kern indicated the OS was killing java (versus an internal OOM).
The nodetool repair is interesting. My application never deletes, so I didn't bother running it. But, if that helps prevent OOMs as well, I'll add it to the crontab.... (plan A is still upgrading to 0.8.0). will On Wed, Jun 22, 2011 at 8:53 AM, Sasha Dolgy <sdo...@gmail.com> wrote: > Yes ... this is because it was the OS that killed the process, and > wasn't related to Cassandra "crashing". Reviewing our monitoring, we > saw that memory utilization was pegged at 100% for days and days > before it was finally killed because 'apt' was fighting for resource. > At least, that's as far as I got in my investigation before giving up, > moving to 0.8.0 and implementing 24hr nodetool repair on each node via > cronjob....so far ... no problems. > > On Wed, Jun 22, 2011 at 2:49 PM, William Oberman > <ober...@civicscience.com> wrote: > > Well, I managed to run 50 days before an OOM, so any changes I make will > > take a while to test ;-) I've seen the GCInspector log lines appear > > periodically in my logs, but I didn't see a correlation with the crash. > > I'll read the instructions on how to properly do a rolling upgrade today, > > practice on test, and try that on production first. > > will > -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com