> heap of 1867M is kind of small. According to the discussion on this list, > it's advisable to have m1.xlarge. +1
In cassadrea-env.sh set the MAX_HEAP_SIZE to 4GB, and the NEW_HEAP_SIZE to 400M In the yaml file set in_memory_compaction_limit_in_mb to 32 compaction_throughput_mb_per_sec to 8 concurrent_compactors to 2 This will slow down compaction a lot. You may want to restore some of these settings once you have things stable. You have an under powered box for what you are trying to do. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 4:47 PM, Wei Zhu <wz1...@yahoo.com> wrote: > It's clear you are out of memory. How big is your data size? > heap of 1867M is kind of small. According to the discussion on this list, > it's advisable to have m1.xlarge. > > Attached please find the related thread. > > -Wei > > ----- Original Message ----- > From: "Dane Miller" <d...@optimalsocial.com> > To: user@cassandra.apache.org > Sent: Wednesday, March 20, 2013 7:13:44 PM > Subject: Stream fails during repair, two nodes out-of-memory > > After having just solved one repair problem, I immediately hit > another. Again, much appreciation for suggestions... > > I'm having problems repairing a CF, and the failure consistenly brings > down 2 of the 6 nodes in the cluster. I'm running "repair -pr" on a > single CF on node2, the repair starts streaming, and after about 60 > seconds both node2 and node4 crash with java.lang.OutOfMemoryError. > The keyspace has rf=3 and is being actively written to by our > application. > > The abbrieviated logs below show the pattern, after which I kill -9 > and restart cassandra on the two nodes. What extra info should I > include? I'm kind of overwhelmed by the volume of logs being > generated and not sure what is signal vs noise. I'm especially seeing > big repeating sections of StatusLogger and FlushWriter/Memtable. > > Details: > 6 node cluster > cassandra 1.2.2 - single token per node > RandomPartitioner, EC2Snitch > Replication: SimpleStrategy, rf=3 > Ubuntu 10.10 x86_64 > EC2 m1.large > Cassandra max heap: 1867M > > > node2 (abbrieviated logs) > > ERROR 21:11:22 AbstractStreamSession.java Stream failed because [node4] died > GC for ConcurrentMarkSweep: 2365 ms for 2 collections, 1913603168 > used; max is 1937768448 > Pool Name Active Pending Blocked > ReadStage 7 7 0 > RequestResponseStage 0 0 0 > ReadRepairStage 0 0 0 > MutationStage 32 4707 0 > ReplicateOnWriteStage 0 0 0 > GossipStage 0 0 0 > AntiEntropyStage 0 0 0 > MigrationStage 0 0 0 > MemtablePostFlusher 1 1 0 > FlushWriter 1 1 0 > MiscStage 0 0 0 > commitlog_archiver 0 0 0 > InternalResponseStage 0 0 0 > AntiEntropySessions 1 1 0 > HintedHandoff 0 0 0 > CompactionManager 1 21 > MessagingService n/a 291,35 > WARN 21:12:52 GCInspector.java Heap is 0.9875293252788064 full > INFO 21:12:52 Gossiper.java InetAddress [node5] is now dead. > INFO 21:12:52 Gossiper.java InetAddress [node1] is now dead. > INFO 21:12:52 Gossiper.java InetAddress [node6] is now dead. > INFO 21:12:52 ColumnFamilyStore.java Enqueuing flush of Memtable-[MyCF]@... > INFO 21:12:52 MessagingService.java 4415 MUTATION messages dropped in > last 5000ms > INFO 21:12:52 Gossiper.java InetAddress [node5] is now UP > INFO 21:12:52 Gossiper.java InetAddress [node1] is now UP > INFO 21:12:52 Gossiper.java InetAddress [node6] is now UP > INFO 21:12:52 HintedHandOffManager.java Started hinted handoff for > host: [node5] > INFO 21:12:52 HintedHandOffManager.java Started hinted handoff for > host: [node1] > ERROR 21:12:56 CassandraDaemon.java java.lang.OutOfMemoryError: Java heap > space > (full OutOfMemory stack trace is included at bottom) > > node4 (abbrieviated logs) > > INFO 21:10:05 StreamOutSession.java Streaming to [node2] > INFO 21:10:14 CompactionTask.java Compacted 4 sstables to [MyCF-ib-17665] > INFO 21:10:24 StreamReplyVerbHandler.java Successfully sent > [MyCF]-ib-17647-Data.db to [node2] > INFO 21:10:24 GCInspector.java GC for ConcurrentMarkSweep > GC for ConcurrentMarkSweep: 764 ms for 3 collections, 1408393640 used; > max is 1937768448 > GC for ConcurrentMarkSweep: 2198 ms for 2 collections, 1882942392 > used; max is 1937768448 > Pool Name Active Pending Blocked > ReadStage 5 5 0 > RequestResponseStage 0 20 0 > ReadRepairStage 0 0 0 > MutationStage 0 0 0 > ReplicateOnWriteStage 0 0 0 > GossipStage 0 8 0 > AntiEntropyStage 0 0 0 > MigrationStage 0 0 0 > MemtablePostFlusher 0 0 0 > FlushWriter 0 0 0 > MiscStage 0 0 0 > commitlog_archiver 0 0 0 > InternalResponseStage 0 0 0 > AntiEntropySessions 0 0 0 > HintedHandoff 1 1 0 > CompactionManager 0 6 > MessagingService n/a 10,15 > INFO 21:11:35 Gossiper.java InetAddress [node5] is now dead. > INFO 21:11:35 Gossiper.java InetAddress [node2] is now dead. > ERROR 21:13:17 CassandraDaemon.java java.lang.OutOfMemoryError: Java heap > space > (full OutOfMemory stack trace is included at bottom) > > > > > node2 full OOM stack trace: > > ERROR [Thread-417] 2013-03-20 21:12:56,114 CassandraDaemon.java (line > 133) Exception in thread Thread[Thread-417,5,main] > java.lang.OutOfMemoryError: Java heap space > at org.apache.cassandra.utils.obs.OpenBitSet.<init>(OpenBitSet.java:76) > at > org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143) > at > org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:114) > at > org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:101) > at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:40) > at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:31) > at > org.apache.cassandra.db.ColumnIndex$Builder.<init>(ColumnIndex.java:74) > at > org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:243) > at > org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) > at > org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226) > at > org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > node4 full OOM stack trace: > > ERROR [Thread-326] 2013-03-20 21:13:22,829 CassandraDaemon.java (line > 133) Exception in thread Thread[Thread-326,5,main] > java.lang.OutOfMemoryError: Java heap space > at org.apache.cassandra.utils.obs.OpenBitSet.<init>(OpenBitSet.java:76) > at > org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:143) > at > org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:114) > at > org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:101) > at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:40) > at org.apache.cassandra.db.ColumnIndex.<init>(ColumnIndex.java:31) > at > org.apache.cassandra.db.ColumnIndex$Builder.<init>(ColumnIndex.java:74) > at > org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:243) > at > org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) > at > org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226) > at > org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > > Dane > <attachment.eml>