Sent from my iPhone
> On Mar 3, 2017, at 12:18 PM, Shravan Ch <chall...@outlook.com> wrote: > > Hello, > > More than 30 plus Cassandra servers in the primary DC went down OOM exception > below. What puzzles me is the scale at which it happened (at the same > minute). I will share some more details below. > > System Log: http://pastebin.com/iPeYrWVR > GC Log: http://pastebin.com/CzNNGs0r > > During the OOM I saw lot of WARNings like the below (these were there for > quite sometime may be weeks) > WARN [SharedPool-Worker-81] 2017-03-01 19:55:41,209 BatchStatement.java:252 > - Batch of prepared statements for [keyspace.table] is of size 225455, > exceeding specified threshold of 65536 by 159919. > > Environment: > We are using ApacheCassandra-2.1.9 on Multi DC cluster. Primary DC (more C* > nodes on SSD and apps run here) and secondary DC (geographically remote and > more like a DR to primary) on SAS drives. > Cassandra config: > > Java 1.8.0_65 > Garbage Collector: G1GC > memtable_allocation_type: offheap_objects > > Post this OOM I am seeing huge hints pile up on majority of the nodes and the > pending hints keep going up. I have increased HintedHandoff CoreThreads to 6 > but that did not help (I admit that I tried this on one node to try). > > nodetool compactionstats -H > pending tasks: 3 > compaction type keyspace table > completed total unit progress > Compaction system hints > 28.5 GB 92.38 GB bytes 30.85% > > > Appreciate your inputs here. > > Thanks, > Shravan