> several compactions on few 200-300 GB SSTables Sounds like some big files. Out of interest how much data do you have per node ? Also do you have wide rows ? Can check via nodetool cfstats.
In cases where OOM / GC is related to compaction these are the steps i take first. It's heavy handed and will probably increase the IO load. Once you stabilise you should see if you can increase them. in cassandra.yaml * set concurrent_compactors to 2 - this will reduce the number of concurrent compactions. * if you have wide rows reduce in_memory_compaction_limit_in_mb to 32 or lower. (as you are on 0.8.X also check memtable_total_space_in_mb is enabled) Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/02/2012, at 10:14 AM, Feng Qu wrote: > Hello, > > We have a 6-node ring running 0.8.6 on RHEL 6.1. The first node also runs > OpsCenter community. This node has crashed few time recently with > "OutOfMemoryError: Java heap space" while several compactions on few 200-300 > GB SSTables were running. We are using 8GB Java heap on host with 96GB RAM. > > I would appreciate for help to figure out the root cause and solution. > > Feng Qu > > > INFO [GossipTasks:1] 2012-02-22 13:15:59,135 Gossiper.java (line 697) > InetAddress /10.89.74.67 is now dead. > INFO [ScheduledTasks:1] 2012-02-22 13:16:12,114 StatusLogger.java (line 65) > ReadStage 0 0 0 > ERROR [CompactionExecutor:10538] 2012-02-22 13:16:12,115 > AbstractCassandraDaemon.java (line 139) Fatal exception in thread > Thread[CompactionExecutor:10538,1, > main] > java.lang.OutOfMemoryError: Java heap space > at > org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:123) > at > org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:57) > at > org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:664) > at > org.apache.cassandra.db.compaction.CompactionIterator.getCollatingIterator(CompactionIterator.java:92) > at > org.apache.cassandra.db.compaction.CompactionIterator.<init>(CompactionIterator.java:68) > at > org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:553) > at > org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:507) > at > org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:142) > at > org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:108) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > INFO [GossipTasks:1] 2012-02-22 13:16:12,115 Gossiper.java (line 697) > InetAddress /10.2.128.55 is now dead. > ERROR [Thread-734] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java (line > 139) Fatal exception in thread Thread[Thread-734,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) > at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136) > ERROR [Thread-68450] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[Thread-68450,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) > at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136) > ERROR [Thread-731] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java (line > 139) Fatal exception in thread Thread[Thread-731,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) > at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136) > ERROR [Thread-736] 2012-02-22 13:16:48,186 AbstractCassandraDaemon.java (line > 139) Fatal exception in thread Thread[Thread-736,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) > at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136) > ERROR [Thread-723] 2012-02-22 13:16:47,746 AbstractCassandraDaemon.java (line > 139) Fatal exception in thread Thread[Thread-723,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) > at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490) >