There have been a lot of discussions about GC tuning on the mail thread. Here's 
a really quick set of guidelines I use, please search the mail archive if it 
does not answer your question. 

If heavy GC activity correlates with cassandra compaction, do one or more of:
* reduce concurrent_compactions to 2 or 3
* reduce compaction_throughput
* reduce in_memory_compction_throughput

These are heavy handed changes designed to get things under control, you 
probably want to remove some of the changes later. 

Enable GC logging in cassandra-env.sh and look at how much memory is in use 
after a full/CMS compaction. If this is more than 50% of the heap you may end 
up doing a lot of GC. If you have hundreds of millions of rows per node, on pre 
1.2, reduce the bloom_fp_chance on the CF's and index_sampling yaml config to 
reduce JVM memory use. 

If you have wide rows consider using (on 4 to 8 cores)
NEW_HEAP: 1000M
SurviviorRatio 4
MaxTenuringThreshold 4

Look at the tenuring distribution in the GC log to see how many ParNew passes 
objects make it through. If you often see more objects  with tenuring 1 or 2 
consider running with MaxTenuringThreshold 2. This can help reduce the amount 
of premature tenuring. 

GC problems are a combination of workload and configuration, and sometimes take 
a while to sort out. 

Hope that helps 
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/04/2013, at 11:53 PM, Michael Theroux <mthero...@yahoo.com> wrote:

> Hello,
> 
> Just to wrap up on my part of this thread, tuning CMS compaction threshold 
> (-XX:CMSInitiatingOccupancyFraction) to 70 appears to resolved my issues with 
> the memory warnings.  However, I don't believe this would be a solution to 
> all the issues mentioned below.  Although, it does make sense to me tune this 
> value below the "flush_largest_memtables_at" value in cassandra.yaml so CMS 
> compaction will kick in before we start flushing memtables to free memory.
> 
> Thanks!
> -Mike
> 
> On Apr 23, 2013, at 12:47 PM, Haithem Jarraya wrote:
> 
>> We are facing similar issue, and we are not able to have the ring stable.  
>> We are using C*1.2.3 on Centos6, 32GB - RAM, 8GB-heap, 6 Nodes.
>> The total data ~ 84gb (which is relatively small for C* to handle, with a RF 
>> of 3).  Our application is heavy read, we see the GC complaints in all 
>> nodes, I copied and past the output below.
>> Also we usually see much larger values for the Pending - ReadStage, not sure 
>> what is the best advice for this.
>> 
>> Thanks,
>> 
>> Haithem
>>  
>> INFO [ScheduledTasks:1] 2013-04-23 16:40:02,118 GCInspector.java (line 119) 
>> GC for ConcurrentMarkSweep: 911 ms for 1 collections, 5945542968 used; max 
>> is 8199471104
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:16,051 GCInspector.java (line 119) 
>> GC for ConcurrentMarkSweep: 322 ms for 1 collections, 5639896576 used; max 
>> is 8199471104
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,829 GCInspector.java (line 119) 
>> GC for ConcurrentMarkSweep: 2273 ms for 1 collections, 6762618136 used; max 
>> is 8199471104
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 53) 
>> Pool Name                    Active   Pending   Blocked
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 68) 
>> ReadStage                         4         4         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
>> RequestResponseStage              1         6         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
>> ReadRepairStage                   0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
>> MutationStage                     0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
>> ReplicateOnWriteStage             0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
>> GossipStage                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
>> AntiEntropyStage                  0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
>> MigrationStage                    0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
>> MemtablePostFlusher               0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
>> FlushWriter                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
>> MiscStage                         0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
>> commitlog_archiver                0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) 
>> InternalResponseStage             0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) 
>> AntiEntropySessions               0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) 
>> HintedHandoff                     0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,843 StatusLogger.java (line 73) 
>> CompactionManager                 0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 85) 
>> MessagingService                n/a      15,1
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 95) 
>> Cache Type                     Size                 Capacity               
>> KeysToSave                                                         Provider
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 96) 
>> KeyCache                  251658064                251658081                 
>>      all     
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 
>> 102) RowCache                          0                        0            
>>           all              
>> org.apache.cassandra.cache.SerializingCacheProvider
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 
>> 109) ColumnFamily                Memtable ops,data
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 
>> 112) system.local                              0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 
>> 112) system.peers                              0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 
>> 112) system.batchlog                           0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 
>> 112) system.NodeIdInfo                         0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 
>> 112) system.LocationInfo                       0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 
>> 112) system.Schema                             0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 
>> 112) system.Migrations                         0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 
>> 112) system.schema_keyspaces                   0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 
>> 112) system.schema_columns                     0,0
>> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) 
>> system.schema_columnfamilies                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 
>> 112) system.IndexInfo                          0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 
>> 112) system.range_xfers                        0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 
>> 112) system.peer_events                        0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 
>> 112) system.hints                              0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 
>> 112) system.HintsColumnFamily                  0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 
>> 112) x.foo                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 
>> 112) x.foo2                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 
>> 112) x.foo3                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 
>> 112) x.foo4                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 
>> 112) x.foo5                      0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 
>> 112) x.foo6                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 
>> 112) x.foo7                     0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 
>> 112) system_auth.users                         0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 
>> 112) system_traces.sessions                    0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 
>> 112) system_traces.events                      0,0
>>  WARN [ScheduledTasks:1] 2013-04-23 16:40:30,850 GCInspector.java (line 142) 
>> Heap is 0.824762725573964 full.  You may need to reduce memtable and/or 
>> cache sizes.  Cassandra will now flush up to the two largest memtables to 
>> free up memory.  Adjust flush_largest_memtables_at threshold in 
>> cassandra.yaml if you don't want Cassandra to do this automatically
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,850 StorageService.java (line 
>> 3537) Unable to reduce heap usage since there are no dirty column families
>> 
>> 
>> 
>> 
>> On 23 April 2013 16:52, Ralph Goers <ralph.go...@dslextreme.com> wrote:
>> We are using DSE, which I believe is also 1.1.9.  We have basically had a 
>> non-usable cluster for months due to this error.  In our case, once it 
>> starts doing this it starts flushing sstables to disk and eventually fills 
>> up the disk to the point where it can't compact.  If we catch it soon enough 
>> and restart the node it usually can recover.
>> 
>> In our case, the heap size is 12 GB. As I understand it Cassandra will give 
>> 1/3 of that for sstables. I then noticed that we have one column family that 
>> is using nearly 4GB in bloom filters on each node.  Since the nodes will 
>> start doing this when the heap reaches 9GB we essentially only have 1GB of 
>> free memory so when compactions, cleanups, etc take place this situation 
>> starts happening.  We are working to change our data model to try to resolve 
>> this.
>> 
>> Ralph
>> 
>> On Apr 19, 2013, at 8:00 AM, Michael Theroux wrote:
>> 
>> > Hello,
>> >
>> > We've recently upgraded from m1.large to m1.xlarge instances on AWS to 
>> > handle additional load, but to also relieve memory pressure.  It appears 
>> > to have accomplished both, however, we are still getting a warning, 0-3 
>> > times a day, on our database nodes:
>> >
>> > WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 
>> > 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable 
>> > and/or cache sizes.  Cassandra will now flush up to the two largest 
>> > memtables to free up memory.  Adjust flush_largest_memtables_at threshold 
>> > in cassandra.yaml if you don't want Cassandra to do this automatically
>> >
>> > This is happening much less frequently than before the upgrade, but after 
>> > essentially doubling the amount of available memory, I'm curious on what I 
>> > can do to determine what is happening during this time.
>> >
>> > I am collecting all the JMX statistics.  Memtable space is elevated but 
>> > not extraordinarily high.  No GC messages are being output to the log.
>> >
>> > These warnings do seem to be occurring doing compactions of column 
>> > families using LCS with wide rows, but I'm not sure there is a direct 
>> > correlation.
>> >
>> > We are running Cassandra 1.1.9, with a maximum heap of 8G.
>> >
>> > Any advice?
>> > Thanks,
>> > -Mike
>> 
>> 
> 

Reply via email to