Nitin created CASSANDRA-6346:
--------------------------------

             Summary: Cassandra 2.0 server node runs out of memory during 
writes/replications
                 Key: CASSANDRA-6346
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6346
             Project: Cassandra
          Issue Type: Bug
            Reporter: Nitin


Currently we are running 18 node cassandra cluster with 
NetworkTopologyReplication Strategy (d1 = 3 and d2=3).  

Our severs seem to crash with OOM exceptions. Our heap size is 8Gb. However 
while crashing i got hold of the hprof file and ran it through an eclipse MAT 
analyzer

After analyzing the hprof (please see attachment for top offenders), i find 
that there is a linked blocking queue (from mutation stage) that seems to hold 
about 7.3 Gb of the total 8Gb of ram. 

After deep diving into the cassandra2.0 code, i see that every 
update/write/replication goes through stages and mutation stage  and the no of 
threads that flush this queue (I am assuming memtable to sstable write) is 
controlled by concurrent writes. Ours is set to 32 concurrent writes

However we observe node crashes even when there are 0 writes to the node but 
replication requests are floating around the cluster. 

Any ideas what are the knobs to throttle the size of these queues/max no of 
write and replication requests a node can get? What are the recommended 
settings to operate cassandra node in a mode where it rejects requests beyond 
certain queue threshold?






--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to