Hi Vincent, one of the usual causes of OOMs is very large partitions. Could you check your nodetool cfstats output in search of large partitions ? If you find one (or more), run nodetool cfhistograms on those tables to get a view of the partition sizes distribution.
Thanks On Mon, Nov 21, 2016 at 12:01 PM Vladimir Yudovin <[email protected]> wrote: > Did you try any value in the range 8-20 (e.g. 60-70% of physical memory). > Also how many tables do you have across all keyspaces? Each table can > consume minimum 1M of Java heap. > > Best regards, Vladimir Yudovin, > > *Winguzone <https://winguzone.com?from=list> - Hosted Cloud > CassandraLaunch your cluster in minutes.* > > > ---- On Mon, 21 Nov 2016 05:13:12 -0500*Vincent Rischmann > <[email protected] <[email protected]>>* wrote ---- > > Hello, > > we have a 8 node Cassandra 2.1.15 cluster at work which is giving us a lot > of trouble lately. > > The problem is simple: nodes regularly die because of an out of memory > exception or the Linux OOM killer decides to kill the process. > For a couple of weeks now we increased the heap to 20Gb hoping it would > solve the out of memory errors, but in fact it didn't; instead of getting > out of memory exception the OOM killer killed the JVM. > > We reduced the heap on some nodes to 8Gb to see if it would work better, > but some nodes crashed again with out of memory exception. > > I suspect some of our tables are badly modelled, which would cause > Cassandra to allocate a lot of data, however I don't how to prove that > and/or find which table is bad, and which query is responsible. > > I tried looking at metrics in JMX, and tried profiling using mission > control but it didn't really help; it's possible I missed it because I have > no idea what to look for exactly. > > Anyone have some advice for troubleshooting this ? > > Thanks. > > -- ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com
