Oh, and my startup command that cassandra logged was a2.bigde.nrel.gov: xss = -ea -javaagent:/opt/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8021M -Xmx8021M -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
And I remember from docs you don't want to go above 8G or java GC doesn't work out so well. I am not sure why this is not working out though. Dean On 2/20/13 7:16 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote: >Here is the printout before that log which is probably important as >wellŠ.. > > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 GCInspector.java (line >122) GC for ConcurrentMarkSweep: 3618 ms for 2 collections, 7038159096 >used; max is 8243904512 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line >57) Pool Name Active Pending Blocked > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line >72) ReadStage 11 264 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line >72) RequestResponseStage 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line >72) ReadRepairStage 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line >72) MutationStage 12 88 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line >72) ReplicateOnWriteStage 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line >72) GossipStage 1 7 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line >72) AntiEntropyStage 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line >72) MigrationStage 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line >72) StreamStage 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line >72) MemtablePostFlusher 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line >72) FlushWriter 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line >72) MiscStage 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line >72) commitlog_archiver 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line >72) InternalResponseStage 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line >72) HintedHandoff 0 0 0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line >77) CompactionManager 4 5 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line >89) MessagingService n/a 10,127 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line >99) Cache Type Size Capacity > KeysToSave >Provider > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line >100) KeyCache 1310719 1310719 > all > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line >106) RowCache 0 0 > all >org.apache.cassandra.cache.SerializingCacheProvider > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line >113) ColumnFamily Memtable ops,data > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line >116) databus5.ServersThatLog 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line >116) databus5.DecimalIndice 971,885311 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line >116) databus5.EntityGroupXref 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line >116) databus5.IntegerIndice 3256,620947 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line >116) databus5.KeyToTableName 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line >116) databus5.SdiTable 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line >116) databus5.DboTableMeta 339,41537 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line >116) databus5.nreldata 4528,3261763 > INFO [GossipTasks:1] 2013-02-20 07:14:00,375 Gossiper.java (line 830) >InetAddress /10.10.43.5 is now dead. > INFO [GossipTasks:1] 2013-02-20 07:14:00,380 Gossiper.java (line 830) >InetAddress /10.10.43.8 is now dead. > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line >116) databus5.DboColumnMeta 809,112672 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line >116) databus5.DboDatabaseMeta 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line >116) databus5.WebNodeDbo 2310,4470 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line >116) databus5.User 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line >116) databus5.RoleMapping 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line >116) databus5.LogEvent 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line >116) databus5.SecureResourceGroupXref 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line >116) databus5.SecurityGroup 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line >116) databus5.StreamAggregation 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line >116) databus5.AppProperty 0,0 > INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830) >InetAddress /10.10.43.3 is now dead. > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line >116) databus5.StringIndice 290,27231 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line >116) databus5.MonitorDbo 75,2983 > INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830) >InetAddress /10.10.43.7 is now dead. > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line >116) databus5.SdiColumn 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line >116) system.NodeIdInfo 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line >116) system.IndexInfo 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line >116) system.LocationInfo 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line >116) system.Versions 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line >116) system.schema_keyspaces 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line >116) system.Migrations 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line >116) system.schema_columnfamilies 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line >116) system.schema_columns 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line >116) system.HintsColumnFamily 0,0 > INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line >116) system.Schema 0,0 > WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 GCInspector.java (line >145) Heap is 0.8537409774403752 full. You may need to reduce memtable >and/or cache sizes. Cassandra will now flush up to the two largest >memtables to free up memory. Adjust flush_largest_memtables_at threshold >in cassandra.yaml if you don't want Cassandra to do this automatically > WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 StorageService.java (line >2855) Flushing CFS(Keyspace='databus5', ColumnFamily='nreldata') to >relieve memory pressure > > > >On 2/20/13 7:10 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote: > >>Cassandra version 1.1.4 >> >>I captured all the logs of node causing timeouts (in a 6 node cluster). >>We seem to get these slowdowns every once in a while and it causes our >>whole website to be 10 times slower. Since PlayOrm actually logs the >>rows being accessed we know exactly which row the timeout occurred on and >>asytanax logged the node which was nice. >> >>Each time we get this website being 10 times slower, this is in the logs >> >> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line >>145) Heap is 0.8532751700072093 full. You may need to reduce memtable >>and/or cache sizes. Cassandra will now flush up to the two largest >>memtables to free up memory. Adjust flush_largest_memtables_at threshold >>in cassandra.yaml if you don't want Cassandra to do this automatically >> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java >>(line 2855) Flushing CFS(Keyspace='databus5', >>ColumnFamily='IntegerIndice') to relieve memory pressure >> >>I seem to remember reading about we should turn off swap which I have not >>gotten around to at this point :(Šwill that help in that the node is >>taken out of the cluster or? As you can see from top, this node has been >>running a 5,000,000 hours below(What's up with thatŠ.maybe the h does not >>actually mean hours) though I know it has been running for months. >> >>Here is a top commandŠ. >> >>top - 07:02:37 up 148 days, 21:11, 1 user, load average: 1.10, 1.09, >>1.09 >>Tasks: 379 total, 2 running, 377 sleeping, 0 stopped, 0 zombie >>Cpu(s): 15.9%us, 1.0%sy, 2.9%ni, 79.9%id, 0.1%wa, 0.0%hi, 0.2%si, >>0.0%st >>Mem: 32854680k total, 32582688k used, 271992k free, 44k buffers >>Swap: 33554424k total, 52180k used, 33502244k free, 22653528k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 701 cassandr 20 0 63.3g 10g 1.9g S 838.4 32.2 5125794h java >> >>What can I dump to see why we need to "relieve memory pressure"? >> >>I ran jmap like so but it can't attach for some reason >> >>[cassandra@a4 ~]$ jmap 701 -heap:format=b >>Attaching to core -heap:format=b from executable 701, please wait... >>Error attaching to core file: Can't attach to the core file >> >>Jstack is not helping much either Š. >> >>[cassandra@a4 ~]$ jstack -l 701 > threads.txt >>701: Unable to open socket file: target process not responding or HotSpot >>VM not loaded >>The -F option can be used when the target process is not responding >>[cassandra@a4 ~]$ jstack -F -l 701 > threads.txt >>Attaching to process ID 701, please wait... >>Debugger attached successfully. >>Server compiler detected. >>JVM version is 20.7-b02 >>java.lang.NullPointerException >>at >>sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:42 >>6 >>) >>at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137) >>at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100) >>at >>sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93) >>at >>sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39) >>at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52) >>at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) >>at sun.jvm.hotspot.tools.JStack.run(JStack.java:60) >>at sun.jvm.hotspot.tools.Tool.start(Tool.java:221) >>at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) >>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>at >>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java >>: >>39) >>at >>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI >>m >>pl.java:25) >>at java.lang.reflect.Method.invoke(Method.java:597) >>at sun.tools.jstack.JStack.runJStackTool(JStack.java:118) >>at sun.tools.jstack.JStack.main(JStack.java:84) >> >>Thanks, >>Dean >