Re: Out of memory and/or OOM kill on a cluster

Vincent Rischmann Mon, 21 Nov 2016 05:57:30 -0800

@Vladimir



We tried with 12Gb and 16Gb, the problem appeared eventually too.

In this particular cluster we have 143 tables across 2 keyspaces.



@Alexander



We have one table with a max partition of 2.68GB, one of 256 MB, a bunch
with the size varying between 10MB to 100MB ~. Then there's the rest
with the max lower than 10MB.


On the biggest, the 99% is around 60MB, 98% around 25MB, 95%
around 5.5MB.
On the one with max of 256MB, the 99% is around 4.6MB, 98% around 2MB.



Could the 1% here really have that much impact ? We do write a lot to
the biggest table and read quite often too, however I have no way to
know if that big partition is ever read.




On Mon, Nov 21, 2016, at 01:09 PM, Alexander Dejanovski wrote:

> Hi Vincent,

> 

> one of the usual causes of OOMs is very large partitions.

> Could you check your nodetool cfstats output in search of large
> partitions ? If you find one (or more), run nodetool cfhistograms on
> those tables to get a view of the partition sizes distribution.
> 

> Thanks

> 

> On Mon, Nov 21, 2016 at 12:01 PM Vladimir Yudovin
> <vla...@winguzone.com> wrote:
>> __

>> Did you try any value in the range 8-20 (e.g. 60-70% of physical
>> memory).
>> Also how many tables do you have across all keyspaces? Each table can
>> consume minimum 1M of Java heap.
>> 

>> Best regards, Vladimir Yudovin, 

>> *Winguzone[1] - Hosted Cloud Cassandra Launch your cluster in
>> minutes.*
>> 

>> 

>> ---- On Mon, 21 Nov 2016 05:13:12 -0500*Vincent Rischmann
>> <m...@vrischmann.me>* wrote ----
>> 

>>> Hello,

>>> 

>>> we have a 8 node Cassandra 2.1.15 cluster at work which is giving us
>>> a lot of trouble lately.
>>> 

>>> The problem is simple: nodes regularly die because of an out of
>>> memory exception or the Linux OOM killer decides to kill the
>>> process.
>>> For a couple of weeks now we increased the heap to 20Gb hoping it
>>> would solve the out of memory errors, but in fact it didn't; instead
>>> of getting out of memory exception the OOM killer killed the JVM.
>>> 

>>> We reduced the heap on some nodes to 8Gb to see if it would work
>>> better, but some nodes crashed again with out of memory exception.
>>> 

>>> I suspect some of our tables are badly modelled, which would cause
>>> Cassandra to allocate a lot of data, however I don't how to prove
>>> that and/or find which table is bad, and which query is responsible.
>>> 

>>> I tried looking at metrics in JMX, and tried profiling using mission
>>> control but it didn't really help; it's possible I missed it because
>>> I have no idea what to look for exactly.
>>> 

>>> Anyone have some advice for troubleshooting this ?

>>> 

>>> Thanks.

> -- 

> -----------------

> Alexander Dejanovski

> France

> @alexanderdeja

> 

> Consultant

> Apache Cassandra Consulting

> http://www.thelastpickle.com[2]




Links:

  1. https://winguzone.com?from=list
  2. http://www.thelastpickle.com/

Re: Out of memory and/or OOM kill on a cluster

Reply via email to