For background, this thread discusses the working for cassandra http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
tl;dr you can work it out or guess based on the tenured usage after CMS. > How can we know how the heap is being used, monitor it ? My favourite is to turn on the gc logging in cassandra-env.sh I can also recommend the GC coverage in this book http://amzn.com/0137142528 You can also use JConsole or anything else that reads the JVM metrics via JMX. > Why have I that much memory used in the heap of my new servers ? IMHO the m1.xlarge is the best EC2 node (apart from ssd) to use. > I configured a 4G heap with a 200M "new size". That is a *very* low new heap size. I would expect it to result it frequent premature promotion into the tenured heap. Which will make it look like you are using more memory. > That is the heap that was supposed to be used. > > Memtable : 1.4G (1/3 of the heap) > Key cache : 0.1G (min(5% of Heap (in MB), 100MB)) > System : 1G (more or less, from datastax doc) > > So we are around 2.5G max in theory out of 3G usable (threshold 0.75 of the > heap before flushing memtable because of pressure) The memtable usage is the maxium value, if all the memtables are full and the flush queue is full. It's not the working size used for memtables. The code tries to avoid ever hitting the maximum. Not sure if the 1G for "system" is still current or what it's actually referring to. I suggest: * returning the configuration to the defaults. * if you have a high number of rows looking at the working set calculations linked above. * monitoring the servers to look for triggers for the GC activity, such as compaction or repair * looking at your code base for read queries that read a lot of data. May be write but it's often read. * if you are using default compaction strategy, looking at the data model rows that have a high number of deletes and or overwrites over a longtime. These can have a high tombstone count. GC activity is relative to the workload. Try to find things that cause a lot of columns to be read from disk. I've found the following JVM tweeks sometimes helpful: MAX_HEAP_SIZE="4G" HEAP_NEWSIZE="1200M" SurvivorRatio=4 MaxTenuringThreshold=4 Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/11/2012, at 10:26 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > It's been Does anybody has an answer to any of these questions ? > > Alain > > > 2012/11/7 Hiller, Dean <dean.hil...@nrel.gov> > +1, I am interested in this answer as well. > > From: Alain RODRIGUEZ <arodr...@gmail.com<mailto:arodr...@gmail.com>> > Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Date: Wednesday, November 7, 2012 9:45 AM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Subject: Re: Questions around the heap > > s application that heavily scans a particular column family, you would want > to inhibit or disable the Bloom filter on the column family by setting it > high" >