> I can rule out the first 3. I was running cassandra with default settings, > i.e. 1GB heap and 256M memtable. So, with 3 memtables+1GB the JVM should run > with >1.75G (although http://wiki.apache.org/cassandra/MemtableThresholds > considers to increase heap size only gently). Did so. 4GB machine with 2GB > 64bit-JVM seemed to run stable for quite some time but then also crashed with > OOM. Looking at the heap dump it's always the same: all memory nearly always > bound in CompactionExecutor (ColumnFamilyStore/ConcurrentSkipListMap, > respectively).
Note that the memtables are just data structures in memory and are in the heap. They are not in addition to the heap. mmap():ed files will be though. For debugging purposes you may want to switch Cassandra to "standard" IO mode instead of mmap. This will have a performance-penalty, but the virtual/resident sizes won't be polluted with mmap():ed data. In general, unless you're hitting something particularly strange or just a bug in Cassandra, you shouldn't be randomly getting OOM:s unless you are truly using that heap space. What do you mean by "always bound in compactionexecutor" - by what method did you determine this to be the case? Note that memtables are ConcurrentSkipListMap:s, so they are expected to contain a lot of data (and maybe be significant themselves if lots of small items are stored). > This looks like somebody else recently have had a similar problem (->Bottom > line: more heap - which is okay, but I'd like to understand why): > http://www.mail-archive.com/user@cassandra.apache.org/msg07516.html There should be no magic need for CPU. Unless you are severely taxing it in terms of very high write load or similar, an out-of-the-box configured cassandra should be needing limited amounts of memory. Did you run with default memtable thresholds (memtable_throughput_in_mb in particular)? How many column families do you have? You need memory for *each* column family, taking into account it's memtable thresholds. > This is my only CF currently in use (via JMX): > > - column_families: > - column_type: Standard > comment: tracking column family > compare_with: org.apache.cassandra.db.marshal.UTF8Type > default_validation_class: org.apache.cassandra.db.marshal.UTF8Type > gc_grace_seconds: 864000 > key_cache_save_period_in_seconds: 3600 > keys_cached: 200000.0 > max_compaction_threshold: 32 > memtable_flush_after_mins: 60 > min_compaction_threshold: 4 > name: tracking > read_repair_chance: 1.0 > row_cache_save_period_in_seconds: 0 > rows_cached: 0.0 > name: test > replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy > replication_factor: 3 This is the only column family being used? > In addition...actually there is plenty of free memory on the heap (?): > > 3605.478: [GC 3605.478: [ParNew > Desired survivor size 2162688 bytes, new threshold 1 (max 1) > - age 1: 416112 bytes, 416112 total > : 16887K->553K(38336K), 0.0209550 secs]3605.499: [CMS: > 1145267K->447565K(2054592K), 1.9143630 secs] 1161938K->447565K(2092928K), > [CMS Perm : 18186K->18158K(30472K)], 1.9355340 secs] [Times: user=1.95 > sys=0.00, real=1.94 secs] > 3607.414: [Full GC 3607.414: [CMS: 447565K->447453K(2054592K), 1.9694960 > secs] 447565K->447453K(2092928K), [CMS Perm : 18158K->18025K(30472K)], > 1.9696450 secs] [Times: user=1.92 sys=0.00, real=1.97 secs] 1.9 seconds to do [CMS: 1145267K->447565K(2054592K) is completely abnormal if that represents a pause (but not if it's just concurrent mark/sweep time). I don't quite recognize the format of this log... I'm suddenly unsure what this log output is coming from. A normal -XX:+PrintGC and -XX:+PrintGCDetails should yield stuff like: 22.634: [GC 22.634: [ParNew: 19136K->2112K(19136K), 0.0058490 secs] 123155K->108012K(202688K), 0.0059050 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 22.660: [GC 22.660: [ParNew: 19136K->2112K(19136K), 0.0063220 secs] 125036K->109893K(202688K), 0.0063910 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 22.686: [GC 22.686: [ParNew: 19136K->2112K(19136K), 0.0067150 secs] 126917K->111864K(202688K), 0.0067730 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 22.724: [CMS-concurrent-abortable-preclean: 0.041/0.257 secs] [Times: user=0.24 sys=0.01, real=0.26 secs] 22.724: [GC[YG occupancy: 19037 K (19136 K)]22.724: [Rescan (parallel) , 0.0085230 secs]22.732: [weak refs processing, 0.0142440 secs] [1 CMS-remark: 109752K(183552K)] 128790K(202688K), 0.0228750 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 22.747: [CMS-concurrent-sweep-start] 22.749: [GC 22.749: [ParNew: 19136K->2112K(19136K), 0.0082060 secs] 128813K->113837K(202688K), 0.0082620 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 22.772: [GC 22.772: [ParNew: 19136K->1500K(19136K), 0.0064660 secs] 116825K->101174K(202688K), 0.0065400 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 22.804: [CMS-concurrent-sweep: 0.041/0.057 secs] [Times: user=0.12 sys=0.00, real=0.05 secs] 22.804: [CMS-concurrent-reset-start] 22.816: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 22.910: [GC 22.910: [ParNew: 18524K->2112K(19136K), 0.0024290 secs] 98398K->81987K(202688K), 0.0024860 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 22.938: [GC 22.938: [ParNew: 19136K->2112K(19136K), 0.0077940 secs] 99011K->83980K(202688K), 0.0078510 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 22.965: [GC 22.965: [ParNew: 19136K->2112K(19136K), 0.0063160 secs] 101004K->85959K(202688K), 0.0063760 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] Where you can clearly see what constitute stop times and what are just timing the CMS activity in the background. Yours seems to be somewhere in between. I vaguely recognize it but I don't remember what produces that... Anyways it certainly looks like you have reasonable heap usage going on as expected. I'm concerned since you reported that independent JVM:s like nodetool suffered OOM:s as well; that really indicates there is a real problem of some kind. -- / Peter Schuller