On Mon, Aug 9, 2010 at 8:20 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > what does tpstats or other JMX monitoring of the o.a.c.concurrent stages show? > > On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: >> I have a 16 node 6.3 cluster and two nodes from my cluster are giving >> me major headaches. >> >> 10.71.71.56 Up 58.19 GB >> 108271662202116783829255556910108067277 | ^ >> 10.71.71.61 Down 67.77 GB >> 123739042516704895804863493611552076888 v | >> 10.71.71.66 Up 43.51 GB >> 127605887595351923798765477786913079296 | ^ >> 10.71.71.59 Down 90.22 GB >> 139206422831293007780471430312996086499 v | >> 10.71.71.65 Up 22.97 GB >> 148873535527910577765226390751398592512 | ^ >> >> The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB + >> commit log directories. They keep growing, along with memory usage, >> eventually the logs start showing GCInspection errors and then the >> nodes will go OOM >> >> INFO 14:20:01,296 Creating new commitlog segment >> /var/lib/cassandra/commitlog/CommitLog-1281378001296.log >> INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving >> 7955651792 used; max is 9773776896 >> INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving >> 8137412920 used; max is 9773776896 >> INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving >> 8310139720 used; max is 9773776896 >> INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving >> 8480136592 used; max is 9773776896 >> INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving >> 8648872520 used; max is 9773776896 >> INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving >> 8816581312 used; max is 9773776896 >> INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving >> 8986063136 used; max is 9773776896 >> INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving >> 9153134392 used; max is 9773776896 >> INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving >> 9318140296 used; max is 9773776896 >> java.lang.OutOfMemoryError: Java heap space >> Dumping heap to java_pid10913.hprof ... >> INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead. >> INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead. >> INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200 >> reclaimed leaving 9334753480 used; max is 9773776896 >> INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead. >> >> Heap dump file created [12730501093 bytes in 253.445 secs] >> ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main] >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) >> ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main] >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) >> INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880 >> reclaimed leaving 9335215296 used; max is 9773776896 >> >> Does anyone have any ideas what is going on? >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Hey guys thanks for the help. I had lowered my Xmx from 12GB to 10xmx because I saw: [r...@cdbsd09 ~]# /usr/local/cassandra/bin/nodetool --host 10.71.71.59 --port 8585 info 123739042516704895804863493611552076888 Load : 68.91 GB Generation No : 1281407425 Uptime (seconds) : 1459 Heap Memory (MB) : 6476.70 / 12261.00 This was happening: [r...@cdbsd11 ~]# /usr/local/cassandra/bin/nodetool --host cdbsd09.hadoop.pvt --port 8585 tpstats Pool Name Active Pending Completed STREAM-STAGE 0 0 0 RESPONSE-STAGE 0 0 16478 ROW-READ-STAGE 64 4014 18190 LB-OPERATIONS 0 0 0 MESSAGE-DESERIALIZER-POOL 0 0 60290 GMFD 0 0 385 LB-TARGET 0 0 0 CONSISTENCY-MANAGER 0 0 7526 ROW-MUTATION-STAGE 64 908 182612 MESSAGE-STREAMING-POOL 0 0 0 LOAD-BALANCER-STAGE 0 0 0 FLUSH-SORTER-POOL 0 0 0 MEMTABLE-POST-FLUSHER 0 0 8 FLUSH-WRITER-POOL 0 0 8 AE-SERVICE-STAGE 0 0 0 HINTED-HANDOFF-POOL 1 9 6 After raising the level I realized I was maxing out the heap. The other nodes are running fine with xmx9GB but I guess these nodes can not. Thanks again. Edward