> CassandraDaemon.java (line 83) Uncaught exception in thread > Thread[pool-1-thread-37895,5,main] > java.lang.OutOfMemoryError: Java heap space > at > org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:296) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:203) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1116) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619)
Did someone send garbage on the wrong port, causing thrift to try to read some huge string in the RPC layer? There is a bug filed about this upstream with thrift but I couldn't find it now. > Is there a problem with Garbage Collection? Should I restart my > servers every few days? No. The CMS collector will be subject to some delayed slow growth as a result of fragmentation in old space (just like malloc would), but in general you should expect to see memory use stabilize. Likely, you're either causing an out of memory condition by some kind of explosive memory use (such as the garbage-on-thrift-port, or some humongous mutation request, etc) or you are legitimately using too much memory in which case you may look into adjusting cache sizes and memtable flushing thresholds. If your version of cassandra logs GC:s (I'm not sure if 0.6.x does it), legitimate heap growth should be obvious from GC messages in the cassandra system log. You can also run with -XX:+PrintGC and -XX:+PrintGCDetails to get GC logs from the JVM on stdout, and with -Xloggc:path/to/log to redirect said GC output to file. You may want to use something like VisualVM or JConsole to attach to cassandra on monitor memory usage if you prefer that to looking at the log output. -- / Peter Schuller