Hi, I am setting up a cluster on a linux box. Everything seems to be working great and I am watching the ring with: watch -d -n 2 nodetool -h localhost ring Suddenly, I see that one of the nodes just went down (at 14:07): Status changed from Up to Down. 13 minutes later (without any intervention) the node comes back Up (by itself). I check the logs (see at end of text) on that node and see that there is nothing in the log from 14:07 until 14:20 (13 minutes later). I also notice the GC ConcurrentMarkSweep took 13 minutes. Here are my questions: [1] Is this behavior normal? [2] Has it been observed by someone else before? [3] The node being down means that nodetool, and any other client, wont be able to connect to it (clients should use other nodes in cluster to write data). Correct? [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence then node is technically Down? Correct? [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow). [6] Any JMV Args (switches) I can use to prevent this? ---------------------- JVM_OPTS=" \ -Dprog=Cassandra \ -ea \ -Xms12G \ -Xmx12G \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false"
-------------------- #### Log Extract ###### INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296 used; max is 13005881344 INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java (line 357)HintsColumnFamily has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log', position=55517352) INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java (line 609) Enqueuing flush of memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208 used; max is 13005881344 INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed leaving 196692400 used; max is 13005881344 --------------