Cassandra version 1.1.4 I captured all the logs of node causing timeouts (in a 6 node cluster). We seem to get these slowdowns every once in a while and it causes our whole website to be 10 times slower. Since PlayOrm actually logs the rows being accessed we know exactly which row the timeout occurred on and asytanax logged the node which was nice.
Each time we get this website being 10 times slower, this is in the logs WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line 145) Heap is 0.8532751700072093 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java (line 2855) Flushing CFS(Keyspace='databus5', ColumnFamily='IntegerIndice') to relieve memory pressure I seem to remember reading about we should turn off swap which I have not gotten around to at this point :(…will that help in that the node is taken out of the cluster or? As you can see from top, this node has been running a 5,000,000 hours below(What's up with that….maybe the h does not actually mean hours) though I know it has been running for months. Here is a top command…. top - 07:02:37 up 148 days, 21:11, 1 user, load average: 1.10, 1.09, 1.09 Tasks: 379 total, 2 running, 377 sleeping, 0 stopped, 0 zombie Cpu(s): 15.9%us, 1.0%sy, 2.9%ni, 79.9%id, 0.1%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 32854680k total, 32582688k used, 271992k free, 44k buffers Swap: 33554424k total, 52180k used, 33502244k free, 22653528k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 701 cassandr 20 0 63.3g 10g 1.9g S 838.4 32.2 5125794h java What can I dump to see why we need to "relieve memory pressure"? I ran jmap like so but it can't attach for some reason [cassandra@a4 ~]$ jmap 701 -heap:format=b Attaching to core -heap:format=b from executable 701, please wait... Error attaching to core file: Can't attach to the core file Jstack is not helping much either …. [cassandra@a4 ~]$ jstack -l 701 > threads.txt 701: Unable to open socket file: target process not responding or HotSpot VM not loaded The -F option can be used when the target process is not responding [cassandra@a4 ~]$ jstack -F -l 701 > threads.txt Attaching to process ID 701, please wait... Debugger attached successfully. Server compiler detected. JVM version is 20.7-b02 java.lang.NullPointerException at sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426) at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137) at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100) at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93) at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39) at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52) at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) at sun.jvm.hotspot.tools.JStack.run(JStack.java:60) at sun.jvm.hotspot.tools.Tool.start(Tool.java:221) at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.tools.jstack.JStack.runJStackTool(JStack.java:118) at sun.tools.jstack.JStack.main(JStack.java:84) Thanks, Dean