> I can rule out the first 3. I was running cassandra with default settings, 
> i.e. 1GB heap and 256M memtable. So, with 3 memtables+1GB the JVM should run 
> with >1.75G (although http://wiki.apache.org/cassandra/MemtableThresholds 
> considers to increase heap size only gently). Did so. 4GB machine with 2GB 
> 64bit-JVM seemed to run stable for quite some time but then also crashed with 
> OOM. Looking at the heap dump it's always the same: all memory nearly always 
> bound in CompactionExecutor (ColumnFamilyStore/ConcurrentSkipListMap, 
> respectively).

Note that the memtables are just data structures in memory and are in
the heap. They are not in addition to the heap. mmap():ed files will
be though.

For debugging purposes you may want to switch Cassandra to "standard"
IO mode instead of mmap. This will have a performance-penalty, but the
virtual/resident sizes won't be polluted with mmap():ed data.

In general, unless you're hitting something particularly strange or
just a bug in Cassandra, you shouldn't be randomly getting OOM:s
unless you are truly using that heap space. What do you mean by
"always bound in compactionexecutor" - by what method did you
determine this to be the case?

Note that memtables are ConcurrentSkipListMap:s, so they are expected
to contain a lot of data (and maybe be significant themselves if lots
of small items are stored).

> This looks like somebody else recently have had a similar problem (->Bottom 
> line: more heap - which is okay, but I'd like to understand why):
> http://www.mail-archive.com/user@cassandra.apache.org/msg07516.html

There should be no magic need for CPU. Unless you are severely taxing
it in terms of very high write load or similar, an out-of-the-box
configured cassandra should be needing limited amounts of memory. Did
you run with default memtable thresholds (memtable_throughput_in_mb in
particular)? How many column families do you have? You need memory for
*each* column family, taking into account it's memtable thresholds.

> This is my only CF currently in use (via JMX):
>
> - column_families:
>  - column_type: Standard
>    comment: tracking column family
>    compare_with: org.apache.cassandra.db.marshal.UTF8Type
>    default_validation_class: org.apache.cassandra.db.marshal.UTF8Type
>    gc_grace_seconds: 864000
>    key_cache_save_period_in_seconds: 3600
>    keys_cached: 200000.0
>    max_compaction_threshold: 32
>    memtable_flush_after_mins: 60
>    min_compaction_threshold: 4
>    name: tracking
>    read_repair_chance: 1.0
>    row_cache_save_period_in_seconds: 0
>    rows_cached: 0.0
>  name: test
>  replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy
>  replication_factor: 3

This is the only column family being used?

> In addition...actually there is plenty of free memory on the heap (?):
>
> 3605.478: [GC 3605.478: [ParNew
> Desired survivor size 2162688 bytes, new threshold 1 (max 1)
> - age   1:     416112 bytes,     416112 total
> : 16887K->553K(38336K), 0.0209550 secs]3605.499: [CMS: 
> 1145267K->447565K(2054592K), 1.9143630 secs] 1161938K->447565K(2092928K), 
> [CMS Perm : 18186K->18158K(30472K)], 1.9355340 secs] [Times: user=1.95 
> sys=0.00, real=1.94 secs]
> 3607.414: [Full GC 3607.414: [CMS: 447565K->447453K(2054592K), 1.9694960 
> secs] 447565K->447453K(2092928K), [CMS Perm : 18158K->18025K(30472K)], 
> 1.9696450 secs] [Times: user=1.92 sys=0.00, real=1.97 secs]

1.9 seconds to do [CMS: 1145267K->447565K(2054592K) is completely
abnormal if that represents a pause (but not if it's just concurrent
mark/sweep time). I don't quite recognize the format of this log...
I'm suddenly unsure what this log output is coming from. A normal
-XX:+PrintGC and -XX:+PrintGCDetails should yield stuff like:

22.634: [GC 22.634: [ParNew: 19136K->2112K(19136K), 0.0058490 secs]
123155K->108012K(202688K), 0.0059050 secs] [Times: user=0.02 sys=0.00,
real=0.00 secs]
22.660: [GC 22.660: [ParNew: 19136K->2112K(19136K), 0.0063220 secs]
125036K->109893K(202688K), 0.0063910 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
22.686: [GC 22.686: [ParNew: 19136K->2112K(19136K), 0.0067150 secs]
126917K->111864K(202688K), 0.0067730 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
22.724: [CMS-concurrent-abortable-preclean: 0.041/0.257 secs] [Times:
user=0.24 sys=0.01, real=0.26 secs]
22.724: [GC[YG occupancy: 19037 K (19136 K)]22.724: [Rescan (parallel)
, 0.0085230 secs]22.732: [weak refs processing, 0.0142440 secs] [1
CMS-remark: 109752K(183552K)] 128790K(202688K), 0.0228750 secs]
[Times: user=0.03 sys=0.00, real=0.02 secs]
22.747: [CMS-concurrent-sweep-start]
22.749: [GC 22.749: [ParNew: 19136K->2112K(19136K), 0.0082060 secs]
128813K->113837K(202688K), 0.0082620 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
22.772: [GC 22.772: [ParNew: 19136K->1500K(19136K), 0.0064660 secs]
116825K->101174K(202688K), 0.0065400 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
22.804: [CMS-concurrent-sweep: 0.041/0.057 secs] [Times: user=0.12
sys=0.00, real=0.05 secs]
22.804: [CMS-concurrent-reset-start]
22.816: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.02
sys=0.00, real=0.02 secs]
22.910: [GC 22.910: [ParNew: 18524K->2112K(19136K), 0.0024290 secs]
98398K->81987K(202688K), 0.0024860 secs] [Times: user=0.01 sys=0.00,
real=0.00 secs]
22.938: [GC 22.938: [ParNew: 19136K->2112K(19136K), 0.0077940 secs]
99011K->83980K(202688K), 0.0078510 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
22.965: [GC 22.965: [ParNew: 19136K->2112K(19136K), 0.0063160 secs]
101004K->85959K(202688K), 0.0063760 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]

Where you can clearly see what constitute stop times and what are just
timing the CMS activity in the background. Yours seems to be somewhere
in between. I vaguely recognize it but I don't remember what produces
that...

Anyways it certainly looks like you have reasonable heap usage going
on as expected.

I'm concerned since you reported that independent JVM:s like nodetool
suffered OOM:s as well; that really indicates there is a real problem
of some kind.

-- 
/ Peter Schuller

Reply via email to