[ https://issues.apache.org/jira/browse/CASSANDRA-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis resolved CASSANDRA-6346. --------------------------------------- Resolution: Not A Problem The main knob to turn to make load shedding more aggressive is to reduce rpc_write_timeout. (See CASSANDRA-6059) > Cassandra 2.0 server node runs out of memory during writes/replications > ----------------------------------------------------------------------- > > Key: CASSANDRA-6346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6346 > Project: Cassandra > Issue Type: Bug > Reporter: Nitin > Attachments: LinkedBlockingQ.png > > > Currently we are running 18 node cassandra cluster with > NetworkTopologyReplication Strategy (d1 = 3 and d2=3). > Our severs seem to crash with OOM exceptions. Our heap size is 8Gb. However > while crashing i got hold of the hprof file and ran it through an eclipse MAT > analyzer > After analyzing the hprof (please see attachment for top offenders), i find > that there is a linked blocking queue (from mutation stage) that seems to > hold about 7.3 Gb of the total 8Gb of ram. > After deep diving into the cassandra2.0 code, i see that every > update/write/replication goes through stages and mutation stage and the no > of threads that flush this queue (I am assuming memtable to sstable write) is > controlled by concurrent writes. Ours is set to 32 concurrent writes > However we observe node crashes even when there are 0 writes to the node but > replication requests are floating around the cluster. > Any ideas what are the knobs to throttle the size of these queues/max no of > write and replication requests a node can get? What are the recommended > settings to operate cassandra node in a mode where it rejects requests beyond > certain queue threshold? -- This message was sent by Atlassian JIRA (v6.1#6144)