Yes ... this is because it was the OS that killed the process, and wasn't related to Cassandra "crashing". Reviewing our monitoring, we saw that memory utilization was pegged at 100% for days and days before it was finally killed because 'apt' was fighting for resource. At least, that's as far as I got in my investigation before giving up, moving to 0.8.0 and implementing 24hr nodetool repair on each node via cronjob....so far ... no problems.
On Wed, Jun 22, 2011 at 2:49 PM, William Oberman <ober...@civicscience.com> wrote: > Well, I managed to run 50 days before an OOM, so any changes I make will > take a while to test ;-) I've seen the GCInspector log lines appear > periodically in my logs, but I didn't see a correlation with the crash. > I'll read the instructions on how to properly do a rolling upgrade today, > practice on test, and try that on production first. > will