I believe this is https://issues.apache.org/jira/browse/CASSANDRA-6358, which was fixed in 2.0.3.
On Wed, Jan 8, 2014 at 7:15 AM, Desimpel, Ignace <ignace.desim...@nuance.com > wrote: > Hi, > > > > On linux and cassandra version 2.0.2 I had an OOM after a heavy load and > then some (15 ) days of idle running (not exactly idle but very very low > activity). > > Two out of a 4 machine cluster had this OOM. > > > > I checked the heap dump (9GB) and that tells me : > > > > One instance of *"java.util.concurrent.ScheduledThreadPoolExecutor"*loaded by > *"<system > class loader>"* occupies *8.927.175.368 (94,53%)* bytes. The instance is > referenced by *org.apache.cassandra.io.sstable.SSTableReader @ > 0x7fadf89e0* , loaded by *"sun.misc.Launcher$AppClassLoader @ > 0x683e6ad30"*. The memory is accumulated in one instance of > *"java.util.concurrent.RunnableScheduledFuture[]"* loaded by *"<system > class loader>"*. > > > > So I checked the SSTableReader instance and found out the > ‘ScheduledThreadPoolExecutor syncExecutor ‘ object is holding about 600k of > ScheduledFutureTasks. > > According to the code on SSTableReader these tasks must have been created > by the code line syncExecutor.scheduleAtFixedRate. That means that none of > these tasks ever get scheduled because some (and only one) initial task is > probably blocking. > > But then again, the one thread to execute these tasks, seems to be in a > ‘normal’ state (at time of OOM) and is executing with a stack trace pasted > below : > > > > Thread 0x696777eb8 > > at > org.apache.cassandra.db.AtomicSortedColumns$1.create(Lorg/apache/cassandra/config/CFMetaData;Z)Lorg/apache/cassandra/db/AtomicSortedColumns; > (AtomicSortedColumns.java:58) > > at > org.apache.cassandra.db.AtomicSortedColumns$1.create(Lorg/apache/cassandra/config/CFMetaData;Z)Lorg/apache/cassandra/db/ColumnFamily; > (AtomicSortedColumns.java:55) > > at > org.apache.cassandra.db.ColumnFamily.cloneMeShallow(Lorg/apache/cassandra/db/ColumnFamily$Factory;Z)Lorg/apache/cassandra/db/ColumnFamily; > (ColumnFamily.java:70) > > at > org.apache.cassandra.db.Memtable.resolve(Lorg/apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ColumnFamily;Lorg/apache/cassandra/db/index/SecondaryIndexManager$Updater;)V > (Memtable.java:187) > > at > org.apache.cassandra.db.Memtable.put(Lorg/apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ColumnFamily;Lorg/apache/cassandra/db/index/SecondaryIndexManager$Updater;)V > (Memtable.java:158) > > at > org.apache.cassandra.db.ColumnFamilyStore.apply(Lorg/apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ColumnFamily;Lorg/apache/cassandra/db/index/SecondaryIndexManager$Updater;)V > (ColumnFamilyStore.java:840) > > at > org.apache.cassandra.db.Keyspace.apply(Lorg/apache/cassandra/db/RowMutation;ZZ)V > (Keyspace.java:373) > > at > org.apache.cassandra.db.Keyspace.apply(Lorg/apache/cassandra/db/RowMutation;Z)V > (Keyspace.java:338) > > at org.apache.cassandra.db.RowMutation.apply()V (RowMutation.java:201) > > at > org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(Lorg/apache/cassandra/service/QueryState;)Lorg/apache/cassandra/transport/messages/ResultMessage; > (ModificationStatement.java:477) > > at > org.apache.cassandra.cql3.QueryProcessor.processInternal(Ljava/lang/String;)Lorg/apache/cassandra/cql3/UntypedResultSet; > (QueryProcessor.java:178) > > at > org.apache.cassandra.db.SystemKeyspace.persistSSTableReadMeter(Ljava/lang/String;Ljava/lang/String;ILorg/apache/cassandra/metrics/RestorableMeter;)V > (SystemKeyspace.java:938) > > at org.apache.cassandra.io.sstable.SSTableReader$2.run()V > (SSTableReader.java:342) > > at > java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; > (Executors.java:471) > > at java.util.concurrent.FutureTask.runAndReset()Z (FutureTask.java:304) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)Z > (ScheduledThreadPoolExecutor.java:178) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V > (ScheduledThreadPoolExecutor.java:293) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (ThreadPoolExecutor.java:1145) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run()V > (ThreadPoolExecutor.java:615) > > at java.lang.Thread.run()V (Thread.java:724) > > > > > > Since each of these tasks are throttled by meterSyncThrottle.acquire() I > suspect that the RateLimiter is causing a delay. The RateLimiter instance > attributes are : > > Type|Name |Value > > long|nextFreeTicketMicros|3016022567383 > > double|maxPermits|100.0 > > double|storedPermits|99.0 > > long|offsetNanos|334676357831746 > > > > I guess that these attributes will practically result in a blocking > behavior, resulting in the OOM … > > > > Is there someone that can make sense out of it? > > I hope this helps in finding out what the reason is for this and maybe > could be avoided in the future. I still have the heap dump, so I can always > pass more information if needed. > > > > Regards, > > > > Ignace Desimpel > -- Tyler Hobbs DataStax <http://datastax.com/>