There have been a lot of discussions about GC tuning on the mail thread. Here's a really quick set of guidelines I use, please search the mail archive if it does not answer your question.
If heavy GC activity correlates with cassandra compaction, do one or more of: * reduce concurrent_compactions to 2 or 3 * reduce compaction_throughput * reduce in_memory_compction_throughput These are heavy handed changes designed to get things under control, you probably want to remove some of the changes later. Enable GC logging in cassandra-env.sh and look at how much memory is in use after a full/CMS compaction. If this is more than 50% of the heap you may end up doing a lot of GC. If you have hundreds of millions of rows per node, on pre 1.2, reduce the bloom_fp_chance on the CF's and index_sampling yaml config to reduce JVM memory use. If you have wide rows consider using (on 4 to 8 cores) NEW_HEAP: 1000M SurviviorRatio 4 MaxTenuringThreshold 4 Look at the tenuring distribution in the GC log to see how many ParNew passes objects make it through. If you often see more objects with tenuring 1 or 2 consider running with MaxTenuringThreshold 2. This can help reduce the amount of premature tenuring. GC problems are a combination of workload and configuration, and sometimes take a while to sort out. Hope that helps ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/04/2013, at 11:53 PM, Michael Theroux <mthero...@yahoo.com> wrote: > Hello, > > Just to wrap up on my part of this thread, tuning CMS compaction threshold > (-XX:CMSInitiatingOccupancyFraction) to 70 appears to resolved my issues with > the memory warnings. However, I don't believe this would be a solution to > all the issues mentioned below. Although, it does make sense to me tune this > value below the "flush_largest_memtables_at" value in cassandra.yaml so CMS > compaction will kick in before we start flushing memtables to free memory. > > Thanks! > -Mike > > On Apr 23, 2013, at 12:47 PM, Haithem Jarraya wrote: > >> We are facing similar issue, and we are not able to have the ring stable. >> We are using C*1.2.3 on Centos6, 32GB - RAM, 8GB-heap, 6 Nodes. >> The total data ~ 84gb (which is relatively small for C* to handle, with a RF >> of 3). Our application is heavy read, we see the GC complaints in all >> nodes, I copied and past the output below. >> Also we usually see much larger values for the Pending - ReadStage, not sure >> what is the best advice for this. >> >> Thanks, >> >> Haithem >> >> INFO [ScheduledTasks:1] 2013-04-23 16:40:02,118 GCInspector.java (line 119) >> GC for ConcurrentMarkSweep: 911 ms for 1 collections, 5945542968 used; max >> is 8199471104 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:16,051 GCInspector.java (line 119) >> GC for ConcurrentMarkSweep: 322 ms for 1 collections, 5639896576 used; max >> is 8199471104 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,829 GCInspector.java (line 119) >> GC for ConcurrentMarkSweep: 2273 ms for 1 collections, 6762618136 used; max >> is 8199471104 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 53) >> Pool Name Active Pending Blocked >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 68) >> ReadStage 4 4 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) >> RequestResponseStage 1 6 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) >> ReadRepairStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) >> MutationStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) >> ReplicateOnWriteStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) >> GossipStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) >> AntiEntropyStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) >> MigrationStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) >> MemtablePostFlusher 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) >> FlushWriter 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) >> MiscStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) >> commitlog_archiver 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) >> InternalResponseStage 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) >> AntiEntropySessions 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) >> HintedHandoff 0 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,843 StatusLogger.java (line 73) >> CompactionManager 0 0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 85) >> MessagingService n/a 15,1 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 95) >> Cache Type Size Capacity >> KeysToSave Provider >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 96) >> KeyCache 251658064 251658081 >> all >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line >> 102) RowCache 0 0 >> all >> org.apache.cassandra.cache.SerializingCacheProvider >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line >> 109) ColumnFamily Memtable ops,data >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line >> 112) system.local 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line >> 112) system.peers 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line >> 112) system.batchlog 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line >> 112) system.NodeIdInfo 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line >> 112) system.LocationInfo 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line >> 112) system.Schema 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line >> 112) system.Migrations 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line >> 112) system.schema_keyspaces 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line >> 112) system.schema_columns 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) >> system.schema_columnfamilies 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line >> 112) system.IndexInfo 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line >> 112) system.range_xfers 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line >> 112) system.peer_events 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line >> 112) system.hints 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line >> 112) system.HintsColumnFamily 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line >> 112) x.foo 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line >> 112) x.foo2 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line >> 112) x.foo3 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line >> 112) x.foo4 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line >> 112) x.foo5 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line >> 112) x.foo6 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line >> 112) x.foo7 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line >> 112) system_auth.users 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line >> 112) system_traces.sessions 0,0 >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line >> 112) system_traces.events 0,0 >> WARN [ScheduledTasks:1] 2013-04-23 16:40:30,850 GCInspector.java (line 142) >> Heap is 0.824762725573964 full. You may need to reduce memtable and/or >> cache sizes. Cassandra will now flush up to the two largest memtables to >> free up memory. Adjust flush_largest_memtables_at threshold in >> cassandra.yaml if you don't want Cassandra to do this automatically >> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,850 StorageService.java (line >> 3537) Unable to reduce heap usage since there are no dirty column families >> >> >> >> >> On 23 April 2013 16:52, Ralph Goers <ralph.go...@dslextreme.com> wrote: >> We are using DSE, which I believe is also 1.1.9. We have basically had a >> non-usable cluster for months due to this error. In our case, once it >> starts doing this it starts flushing sstables to disk and eventually fills >> up the disk to the point where it can't compact. If we catch it soon enough >> and restart the node it usually can recover. >> >> In our case, the heap size is 12 GB. As I understand it Cassandra will give >> 1/3 of that for sstables. I then noticed that we have one column family that >> is using nearly 4GB in bloom filters on each node. Since the nodes will >> start doing this when the heap reaches 9GB we essentially only have 1GB of >> free memory so when compactions, cleanups, etc take place this situation >> starts happening. We are working to change our data model to try to resolve >> this. >> >> Ralph >> >> On Apr 19, 2013, at 8:00 AM, Michael Theroux wrote: >> >> > Hello, >> > >> > We've recently upgraded from m1.large to m1.xlarge instances on AWS to >> > handle additional load, but to also relieve memory pressure. It appears >> > to have accomplished both, however, we are still getting a warning, 0-3 >> > times a day, on our database nodes: >> > >> > WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line >> > 145) Heap is 0.7529240824406468 full. You may need to reduce memtable >> > and/or cache sizes. Cassandra will now flush up to the two largest >> > memtables to free up memory. Adjust flush_largest_memtables_at threshold >> > in cassandra.yaml if you don't want Cassandra to do this automatically >> > >> > This is happening much less frequently than before the upgrade, but after >> > essentially doubling the amount of available memory, I'm curious on what I >> > can do to determine what is happening during this time. >> > >> > I am collecting all the JMX statistics. Memtable space is elevated but >> > not extraordinarily high. No GC messages are being output to the log. >> > >> > These warnings do seem to be occurring doing compactions of column >> > families using LCS with wide rows, but I'm not sure there is a direct >> > correlation. >> > >> > We are running Cassandra 1.1.9, with a maximum heap of 8G. >> > >> > Any advice? >> > Thanks, >> > -Mike >> >> >