Re: Reduce Cassandra GC

aaron morton Wed, 17 Apr 2013 11:30:46 -0700

> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) 
> GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600
This does not say that the heap is full. 
ParNew is GC activity for the new heap, which is typically a smaller part of 
the overall heap.


It sounds like you are running with defaults for the memory config, which is 
generally a good idea. But 4GB total memory for a node is on the small size.

Try some changes, edit the cassandra-env.sh file and change

MAX_HEAP_SIZE="2G"
HEAP_NEWSIZE="400M"

You may also want to try:

MAX_HEAP_SIZE="2G"
HEAP_NEWSIZE="800M"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" 
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"

The size of the new heap generally depends on the number of cores available, 
see the commends in the -env file. 

An older discussion about memory use, not that in 1.2 the bloom filters (and 
compression data) are off heap now.
http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html  

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 11:06 PM, Joel Samuelsson <samuelsson.j...@gmail.com> wrote:

> You're right, it's probably hard. I should have provided more data.
> 
> I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the 
> log indicates that JNA is working, please correct me if I'm wrong:
> CLibrary.java (line 111) JNA mlockall successful
> 
> Total amount of RAM is 4GB.
> 
> My description of data size was very bad. Sorry about that. Data set size is 
> 12.3 GB per node, compressed.
> 
> Heap size is 998.44MB according to nodetool info. 
> Key cache is 49MB bytes according to nodetool info.
> Row cache size is 0 bytes acoording to nodetool info. 
> Max new heap is 205MB kbytes according to Memory Pool "Par Eden Space" max in 
> jconsole.
> Memtable is left at default which should give it 333MB according to 
> documentation (uncertain where I can verify this).
> 
> Our production cluster seems similar to your dev cluster so possibly 
> increasing the heap to 2GB might help our issues.
> 
> I am still interested in getting rough estimates of how much heap will be 
> needed as data grows. Other than empirical studies how would I go about 
> getting such estimates?
> 
> 
> 2013/4/16 Viktor Jevdokimov <viktor.jevdoki...@adform.com>
> How one could provide any help without any knowledge about your cluster, node 
> and environment settings?
> 
>  
> 
> 40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 
> 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any 
> overhead (sstable, bloom filters and indexes).
> 
>  
> 
> With ParNew GC time such as yours even if it is a swapping issue I could say 
> only that heap size is too small.
> 
>  
> 
> Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is 
> JNA installed and used? What is total amount of RAM?
> 
>  
> 
> Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB 
> heap without any GC issue with amount of data from 0 to 16GB compressed on 
> each node. Memtable space sized to 100MB, New Heap 400MB.
> 
>  
> 
> Best regards / Pagarbiai
> Viktor Jevdokimov
> Senior Developer
> 
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider
> Take a ride with Adform's Rich Media Suite
> <signature-logo29.png>
> <signature-best-employer-logo4823.png> 
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
> 
> From: Joel Samuelsson [mailto:samuelsson.j...@gmail.com] 
> Sent: Tuesday, April 16, 2013 12:52
> To: user@cassandra.apache.org
> Subject: Re: Reduce Cassandra GC
> 
>  
> 
> How do you calculate the heap / data size ratio? Is this a linear ratio?
> 
>  
> 
> Each node has slightly more than 12 GB right now though.
> 
>  
> 
> 2013/4/16 Viktor Jevdokimov <viktor.jevdoki...@adform.com>
> 
> For a >40GB of data 1GB of heap is too low.
> 
>  
> 
> Best regards / Pagarbiai
> 
> Viktor Jevdokimov
> 
> Senior Developer
> 
>  
> 
> Email: viktor.jevdoki...@adform.com
> 
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> 
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> 
> Follow us on Twitter: @adforminsider
> 
> Take a ride with Adform's Rich Media Suite
> 
> <image001.png>
> 
> <image002.png>
> 
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
> 
>  
> 
> From: Joel Samuelsson [mailto:samuelsson.j...@gmail.com] 
> Sent: Tuesday, April 16, 2013 10:47
> To: user@cassandra.apache.org
> Subject: Reduce Cassandra GC
> 
>  
> 
> Hi,
> 
>  
> 
> We have a small production cluster with two nodes. The load on the nodes is 
> very small, around 20 reads / sec and about the same for writes. There are 
> around 2.5 million keys in the cluster and a RF of 2.
> 
>  
> 
> About 2.4 million of the rows are skinny (6 columns) and around 3kb in size 
> (each). Currently, scripts are running, accessing all of the keys in 
> timeorder to do some calculations.
> 
>  
> 
> While running the scripts, the nodes go down and then come back up 6-7 
> minutes later. This seems to be due to GC. I get lines like this in the log:
> 
> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) 
> GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600
> 
>  
> 
> However, the heap is not full. The heap usage has a jagged pattern going from 
> 60% up to 70% during 5 minutes and then back down to 60% the next 5 minutes 
> and so on. I get no "Heap is X full..." messages. Every once in a while at 
> one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does 
> GC take up so much time even though the heap isn't full?
> 
>  
> 
> I am aware that my access patterns make key caching very unlikely to be high. 
> And indeed, my average key cache hit ratio during the run of the scripts is 
> around 0.5%. I tried disabling key caching on the accessed column family 
> (UPDATE COLUMN FAMILY cf WITH caching=none;) through the cassandra-cli but I 
> get the same behaviour. Is the turning key cache off effective immediately?
> 
>  
> 
> Stop-the-world GC is fine if it happens for a few seconds but having them for 
> several minutes doesn't work. Any other suggestions to remove them?
> 
>  
> 
> Best regards,
> 
> Joel Samuelsson
> 
>  
> 
>

Re: Reduce Cassandra GC

Reply via email to