Hi community,
during my tests i had several OOM crashes.
Getting some hints to find the problem would be nice.
First cassandra crashes after about 45 min insert test script.
During the following tests time to OOM was shorter until it started to crash
even in "idle" mode.
Here the facts:
- cassandra 0.7 beta 3
- using lucandra to index about 3 million files ~1kb data
- inserting with one client to one cassandra node with about 200 files/s
- cassandra data files for this keyspace grow up to about 20 GB
- the keyspace only contains the two lucandra specific CFs
Cluster:
- cassandra single node on windows 32bit, Xeon 2,5 Ghz, 4GB RAM
- java jre 1.6.0_22
- heap space first 1GB, later increased to 1,3 GB
Cassandra.yaml:
default + reduced "binary_memtable_throughput_in_mb" to 128
CFs:
default + reduced
min_compaction_threshold: 4
max_compaction_threshold: 8
I think the problem appears always during compaction,
and perhaps it is a result of large rows (some about 170mb).
Are there more optionions we could use to work with few memory?
Is it a problem of compaction?
And how to avoid?
Slower inserts? More memory?
Even fewer memtable_throuput or in_memory_compaction_limit?
Continuous manual major comapction?
I've read
http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors
- row_size should be fixed since 0.7 and 200mb is still far away from 2gb
- only key cache is used a little bit 3600/20000
- after a lot of writes cassandra crashes even in idle mode
- memtablesize was reduced and there are only 2 CFs
Several heapdumps in MAT show 60-99% heapusage of compaction thread.
Here some log extract:
INFO [CompactionExecutor:1] 2010-11-26 14:18:18,593
CompactionIterator.java (line 134) Compacting large row
6650325572717566efbfbf44545241434b53efbfbf31 (172967291 bytes)
incrementally
INFO [ScheduledTasks:1] 2010-11-26 14:18:41,421 GCInspector.java
(line 133) GC for ParNew: 365 ms, 54551328 reclaimed leaving 459496840
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:42,437 GCInspector.java
(line 133) GC for ParNew: 226 ms, 12469104 reclaimed leaving 554506776
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:43,453 GCInspector.java
(line 133) GC for ParNew: 224 ms, 12777840 reclaimed leaving 649207976
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:44,468 GCInspector.java
(line 133) GC for ParNew: 225 ms, 12564144 reclaimed leaving 744122872
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:45,468 GCInspector.java
(line 133) GC for ParNew: 222 ms, 16020328 reclaimed leaving 835581584
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:46,468 GCInspector.java
(line 133) GC for ParNew: 226 ms, 12697912 reclaimed leaving 930362712
used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:47,468 GCInspector.java
(line 133) GC for ParNew: 227 ms, 15816872 reclaimed leaving
1022026288 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:48,484 GCInspector.java
(line 133) GC for ParNew: 258 ms, 12746584 reclaimed leaving
1116758744 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:49,484 GCInspector.java
(line 133) GC for ParNew: 257 ms, 12802608 reclaimed leaving
1211435176 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java
(line 133) GC for ConcurrentMarkSweep: 4188 ms, 271308512 reclaimed
leaving 1047605704 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java
(line 153) Pool Name Active Pending
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java
(line 160) ResponseStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java
(line 160) ReadStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java
(line 160) ReadRepair 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java
(line 160) MutationStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java
(line 160) GossipStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java
(line 160) AntientropyStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 160) MigrationStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 160) StreamStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 160) MemtablePostFlusher 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 160) FlushWriter 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 160) MiscStage 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 160) FlushSorter 0 0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 164) CompactionManager n/a 6
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 176) MessagingService n/a 0,0
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 180) ColumnFamily Memtable ops,data Row cache
size/cap Key cache size/cap
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 183) system.LocationInfo 1,17
0/0 3/3
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java
(line 183) system.HintsColumnFamily 0,0
0/0 0/1
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java
(line 183) system.Migrations 0,0
0/0 0/2
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java
(line 183) system.Schema 0,0
0/0 3/3
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java
(line 183) system.IndexInfo 0,0
0/0 0/1
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java
(line 183) Lucandra.Documents 0,0
0/0 0/200000
INFO [ScheduledTasks:1] 2010-11-26 14:18:54,578 GCInspector.java
(line 183) Lucandra.TermInfo 0,0
0/0 0/200000
INFO [ScheduledTasks:1] 2010-11-26 14:19:05,328 GCInspector.java
(line 133) GC for ConcurrentMarkSweep: 3807 ms, 31040 reclaimed
leaving 1153889144 used; max is 1450442752
INFO [ScheduledTasks:1] 2010-11-26 14:20:09,093 GCInspector.java
(line 153) Pool Name Active Pending
INFO [ScheduledTasks:1] 2010-11-26 14:20:09,093 GCInspector.java
(line 160) ResponseStage 0 0
ERROR [CompactionExecutor:1] 2010-11-26 14:20:09,093
AbstractCassandraDaemon.java (line 89) Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
at
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:312)
at
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:291)
at
org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:82)
at
org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:101)
at
org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:96)
at
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:273)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:109)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:87)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)