Out of memory and/or OOM kill on a cluster

Vincent Rischmann Mon, 21 Nov 2016 02:16:11 -0800

Hello,



we have a 8 node Cassandra 2.1.15 cluster at work which is giving us a
lot of trouble lately.


The problem is simple: nodes regularly die because of an out of memory
exception or the Linux OOM killer decides to kill the process.
For a couple of weeks now we increased the heap to 20Gb hoping it would
solve the out of memory errors, but in fact it didn't; instead of
getting out of memory exception the OOM killer killed the JVM.


We reduced the heap on some nodes to 8Gb to see if it would work better,
but some nodes crashed again with out of memory exception.


 I suspect some of our tables are badly modelled, which would cause
 Cassandra to allocate a lot of data, however I don't how to prove that
 and/or find which table is bad, and which query is responsible.


I tried looking at metrics in JMX, and tried profiling using mission
control but it didn't really help; it's possible I missed it because I
have no idea what to look for exactly.


Anyone have some advice for troubleshooting this ?



Thanks.

Out of memory and/or OOM kill on a cluster

Reply via email to