I would recommend always running cassandra with -XX:+HeapDumpOnOutofMemoryError. This dumps out a *.hprof file if the process dies due to OOM
You can later analyze the hprof files using Eclipse Memory Analyzer (Eclipse MAT <http://www.eclipse.org/mat>) to figure out root causes and potential leaks Hope this helps -- Nitin On Thu, Jan 2, 2014 at 9:00 PM, Narendra Sharma <narendra.sha...@gmail.com>wrote: > The root cause turned out to be high heap. The Linux OOM Killer ( > http://linux-mm.org/OOM_Killer) killed the process. It took some time to > figure out but very interesting. We knew high heap is a problem but had no > clue when the actual heap usage was well within limit and the process > disappeared. syslog helped figure this out. > > About Linux OOM Killer > "It is the job of the linux 'oom killer' to *sacrifice* one or more > processes in order to free up memory for the system when all else fails" > > > On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli <rc...@eventbrite.com> wrote: > >> On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma < >> narendra.sha...@gmail.com> wrote: >> >>> 8 node cluster running in aws. Any pointers where I should start looking? >>> No kill -9 in history. >>> >> You should start looking at instructions as to how to upgrade to at least >> the top of the 1.1 line... :D >> >> =Rob >> > > > > -- > Narendra Sharma > Software Engineer > *http://www.aeris.com <http://www.aeris.com>* > *http://narendrasharma.blogspot.com/ <http://narendrasharma.blogspot.com/>* > > -- -- Nitin