OOM on Apache Cassandra on 30 Plus node at the same time

Shravan Ch Fri, 03 Mar 2017 09:19:04 -0800

Hello,

More than 30 plus Cassandra servers in the primary DC went down OOM exception 
below. What puzzles me is the scale at which it happened (at the same minute). 
I will share some more details below.


System Log: http://pastebin.com/iPeYrWVR
GC Log: http://pastebin.com/CzNNGs0r

<http://pastebin.com/CzNNGs0r>During the OOM I saw lot of WARNings like the 
below (these were there for quite sometime may be weeks)
WARN  [SharedPool-Worker-81] 2017-03-01 19:55:41,209 BatchStatement.java:252 - 
Batch of prepared statements for [keyspace.table] is of size 225455, exceeding 
specified threshold of 65536 by 159919.

Environment:
We are using ApacheCassandra-2.1.9 on Multi DC cluster. Primary DC (more C* 
nodes on SSD and apps run here)  and secondary DC (geographically remote and 
more like a DR to primary) on SAS drives.
Cassandra config:

Java 1.8.0_65
Garbage Collector: G1GC
memtable_allocation_type: offheap_objects

Post this OOM I am seeing huge hints pile up on majority of the nodes and the 
pending hints keep going up. I have increased HintedHandoff CoreThreads to 6 
but that did not help (I admit that I tried this on one node to try).

nodetool compactionstats -H
pending tasks: 3
compaction type            keyspace                          table   completed  
    total    unit   progress
        Compaction              system                          hints     28.5 
GB   92.38 GB   bytes     30.85%


Appreciate your inputs here.

Thanks,
Shravan

OOM on Apache Cassandra on 30 Plus node at the same time

Reply via email to