On Dec 14, 2010, at 19:38, Peter Schuller wrote: > For debugging purposes you may want to switch Cassandra to "standard" > IO mode instead of mmap. This will have a performance-penalty, but the > virtual/resident sizes won't be polluted with mmap():ed data.
Already did so. It *seems* to run more stable, but it's still far off from being stable. I actually already put 100 millions rows into a local cassandra instance (on OSX [and on rc1], not xen'ed Linux), so this is unlikely a cassandra Java code problem but rather something native code/platform related. > In general, unless you're hitting something particularly strange or > just a bug in Cassandra, you shouldn't be randomly getting OOM:s > unless you are truly using that heap space. What do you mean by > "always bound in compactionexecutor" - by what method did you > determine this to be the case? heap dumps -> MAT (http://www.eclipse.org/mat/) > There should be no magic need for CPU. Unless you are severely taxing > it in terms of very high write load or similar, an out-of-the-box > configured cassandra should be needing limited amounts of memory. Did > you run with default memtable thresholds (memtable_throughput_in_mb i Yes >> This is my only CF currently in use (via JMX): >> >> - column_families: >> - column_type: Standard >> comment: tracking column family >> compare_with: org.apache.cassandra.db.marshal.UTF8Type >> default_validation_class: org.apache.cassandra.db.marshal.UTF8Type >> gc_grace_seconds: 864000 >> key_cache_save_period_in_seconds: 3600 >> keys_cached: 200000.0 >> max_compaction_threshold: 32 >> memtable_flush_after_mins: 60 >> min_compaction_threshold: 4 >> name: tracking >> read_repair_chance: 1.0 >> row_cache_save_period_in_seconds: 0 >> rows_cached: 0.0 >> name: test >> replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy >> replication_factor: 3 > > This is the only column family being used? Current, for testing, yes. >> In addition...actually there is plenty of free memory on the heap (?): >> >> 3605.478: [GC 3605.478: [ParNew >> Desired survivor size 2162688 bytes, new threshold 1 (max 1) >> - age 1: 416112 bytes, 416112 total >> : 16887K->553K(38336K), 0.0209550 secs]3605.499: [CMS: >> 1145267K->447565K(2054592K), 1.9143630 secs] 1161938K->447565K(2092928K), >> [CMS Perm : 18186K->18158K(30472K)], 1.9355340 secs] [Times: user=1.95 >> sys=0.00, real=1.94 secs] >> 3607.414: [Full GC 3607.414: [CMS: 447565K->447453K(2054592K), 1.9694960 >> secs] 447565K->447453K(2092928K), [CMS Perm : 18158K->18025K(30472K)], >> 1.9696450 secs] [Times: user=1.92 sys=0.00, real=1.97 secs] > > 1.9 seconds to do [CMS: 1145267K->447565K(2054592K) is completely > abnormal if that represents a pause (but not if it's just concurrent > mark/sweep time). I don't quite recognize the format of this log... > I'm suddenly unsure what this log output is coming from. A normal > -XX:+PrintGC and -XX:+PrintGCDetails should yield stuff like: I just uncommented the GC JVMOPTS from the shipped cassandra start script and use Sun JVM 1.6.0_23. Hmm, but these "GC tuning options" are also uncommented. I'll comment them again and try again.