For background, this thread discusses the working for cassandra 
http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html

tl;dr you can work it out or guess based on the tenured usage after CMS. 

> How can we know how the heap is being used, monitor it ?
My favourite is to turn on the gc logging in cassandra-env.sh 
I can also recommend the GC coverage in this book http://amzn.com/0137142528

You can also use JConsole or anything else that reads the JVM metrics via JMX.

> Why have I that much memory used in the heap of my new servers ?
IMHO the m1.xlarge is the best EC2 node (apart from ssd) to use. 

>  I configured a 4G heap with a 200M "new size".

That is a *very* low new heap size. I would expect it to result it frequent 
premature promotion into the tenured heap. Which will make it look like you are 
using more memory.


> That is the heap that was supposed to be used.
> 
> Memtable  : 1.4G (1/3 of the heap)
> Key cache : 0.1G (min(5% of Heap (in MB), 100MB))
> System     : 1G     (more or less, from datastax doc)
> 
> So we are around 2.5G max in theory out of 3G usable (threshold 0.75 of the 
> heap before flushing memtable because of pressure)
The memtable usage is the maxium value, if all the memtables are full and the 
flush queue is full. It's not the working size used for memtables. The code 
tries to avoid ever hitting the maximum. 
Not sure if the 1G for "system" is still current or what it's actually 
referring to.


I suggest:
* returning the configuration to the defaults.
* if you have a high number of rows looking at the working set calculations 
linked above.
* monitoring the servers to look for triggers for the GC activity, such as 
compaction or repair
* looking at your code base for read queries that read a lot of data. May be 
write but it's often read.
* if you are using default compaction strategy, looking at the data model rows 
that have a high number of deletes and or overwrites over a longtime. These can 
have a high tombstone count. 

GC activity is relative to the workload. Try to find things that cause a lot of 
columns to be read from disk.

I've found the following JVM tweeks sometimes helpful:

MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="1200M"
SurvivorRatio=4
MaxTenuringThreshold=4

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/11/2012, at 10:26 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> It's been Does anybody has an answer to any of these questions ?
> 
> Alain
> 
> 
> 2012/11/7 Hiller, Dean <dean.hil...@nrel.gov>
> +1, I am interested in this answer as well.
> 
> From: Alain RODRIGUEZ <arodr...@gmail.com<mailto:arodr...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Wednesday, November 7, 2012 9:45 AM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: Questions around the heap
> 
> s application that heavily scans a particular column family, you would want 
> to inhibit or disable the Bloom filter on the column family by setting it 
> high"
> 

Reply via email to