This is "A row has grown too large" section from that troubleshooting guide.
On Tue, Dec 14, 2010 at 5:27 AM, Timo Nentwig <timo.nent...@toptarif.de>wrote: > > On Dec 12, 2010, at 17:21, Jonathan Ellis wrote: > > > > http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors > > I can rule out the first 3. I was running cassandra with default settings, > i.e. 1GB heap and 256M memtable. So, with 3 memtables+1GB the JVM should run > with >1.75G (although > http://wiki.apache.org/cassandra/MemtableThresholdsconsiders to increase heap > size only gently). Did so. 4GB machine with 2GB > 64bit-JVM seemed to run stable for quite some time but then also crashed > with OOM. Looking at the heap dump it's always the same: all memory nearly > always bound in CompactionExecutor (ColumnFamilyStore/ConcurrentSkipListMap, > respectively). > > This looks like somebody else recently have had a similar problem (->Bottom > line: more heap - which is okay, but I'd like to understand why): > http://www.mail-archive.com/user@cassandra.apache.org/msg07516.html > > This is my only CF currently in use (via JMX): > > - column_families: > - column_type: Standard > comment: tracking column family > compare_with: org.apache.cassandra.db.marshal.UTF8Type > default_validation_class: org.apache.cassandra.db.marshal.UTF8Type > gc_grace_seconds: 864000 > key_cache_save_period_in_seconds: 3600 > keys_cached: 200000.0 > max_compaction_threshold: 32 > memtable_flush_after_mins: 60 > min_compaction_threshold: 4 > name: tracking > read_repair_chance: 1.0 > row_cache_save_period_in_seconds: 0 > rows_cached: 0.0 > name: test > replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy > replication_factor: 3 > > > In addition...actually there is plenty of free memory on the heap (?): > > 3605.478: [GC 3605.478: [ParNew > Desired survivor size 2162688 bytes, new threshold 1 (max 1) > - age 1: 416112 bytes, 416112 total > : 16887K->553K(38336K), 0.0209550 secs]3605.499: [CMS: > 1145267K->447565K(2054592K), 1.9143630 secs] 1161938K->447565K(2092928K), > [CMS Perm : 18186K->18158K(30472K)], 1.9355340 secs] [Times: user=1.95 > sys=0.00, real=1.94 secs] > 3607.414: [Full GC 3607.414: [CMS: 447565K->447453K(2054592K), 1.9694960 > secs] 447565K->447453K(2092928K), [CMS Perm : 18158K->18025K(30472K)], > 1.9696450 secs] [Times: user=1.92 sys=0.00, real=1.97 secs] > Total time for which application threads were stopped: 3.9070380 seconds > Total time for which application threads were stopped: 7.3388640 seconds > Total time for which application threads were stopped: 0.0560610 seconds > 3616.931: [GC 3616.931: [ParNew > Desired survivor size 2162688 bytes, new threshold 1 (max 1) > - age 1: 474264 bytes, 474264 total > : 34112K->747K(38336K), 0.0098680 secs] 481565K->448201K(2092928K), > 0.0099690 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] > Total time for which application threads were stopped: 0.0108670 seconds > 3617.035: [GC 3617.035: [ParNew > Desired survivor size 2162688 bytes, new threshold 1 (max 1) > - age 1: 63040 bytes, 63040 total > : 34859K->440K(38336K), 0.0065950 secs] 482313K->448455K(2092928K), > 0.0066880 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] > Total time for which application threads were stopped: 0.0075850 seconds > 3617.133: [GC 3617.133: [ParNew > Desired survivor size 2162688 bytes, new threshold 1 (max 1) > - age 1: 23016 bytes, 23016 total > : 34552K->121K(38336K), 0.0042920 secs] 482567K->448193K(2092928K), > 0.0043650 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] > Total time for which application threads were stopped: 0.0049630 seconds > 3617.228: [GC 3617.228: [ParNew > Desired survivor size 2162688 bytes, new threshold 1 (max 1) > - age 1: 16992 bytes, 16992 total > : 34233K->34K(38336K), 0.0043180 secs] 482305K->448122K(2092928K), > 0.0043910 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] > Total time for which application threads were stopped: 0.0049150 seconds > 3617.323: [GC 3617.323: [ParNew > Desired survivor size 2162688 bytes, new threshold 1 (max 1) > - age 1: 18456 bytes, 18456 total > : 34146K->29K(38336K), 0.0038930 secs] 482234K->448127K(2092928K), > 0.0039810 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] > Total time for which application threads were stopped: 0.0055390 seconds > Heap > par new generation total 38336K, used 17865K [0x000000077ae00000, > 0x000000077d790000, 0x000000077d790000) > eden space 34112K, 52% used [0x000000077ae00000, 0x000000077bf6afb0, > 0x000000077cf50000) > from space 4224K, 0% used [0x000000077cf50000, 0x000000077cf57720, > 0x000000077d370000) > to space 4224K, 0% used [0x000000077d370000, 0x000000077d370000, > 0x000000077d790000) > concurrent mark-sweep generation total 2054592K, used 448097K > [0x000000077d790000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 30472K, used 18125K > [0x00000007fae00000, 0x00000007fcbc2000, 0x0000000800000000) > > > > > On Sun, Dec 12, 2010 at 9:52 AM, Timo Nentwig <timo.nent...@toptarif.de> > wrote: > > > > On Dec 10, 2010, at 19:37, Peter Schuller wrote: > > > > > To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17 > > > in lenny or some such.) If it is a JVM issue, ensuring you're using a > > > reasonably recent JVM is probably much easier than to start tracking > > > it down... > > > > I had OOM problems with OpenJDK, switched to Sun/Oracle's recent 1.6.0_23 > and...still have the same problem :-\ Stack trace always looks the same: > > > > java.lang.OutOfMemoryError: Java heap space > > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > > at java.nio.ByteBuffer.allocate(ByteBuffer.java:329) > > at > org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:261) > > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76) > > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35) > > at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) > > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120) > > at > org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:383) > > at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:393) > > at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:351) > > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:52) > > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > > > I'm writing from 1 client with 50 threads to a cluster of 4 machines > (with hector). With QUORUM and ONE 2 machines quite reliably will soon die > with OOM. What may cause this? Won't cassandra block/reject when memtable is > full and being flushed to disk but grow and if flushing to disk isn't fast > enough will run out of memory? > > > > > > > > -- > > Jonathan Ellis > > Project Chair, Apache Cassandra > > co-founder of Riptano, the source for professional Cassandra support > > http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com