for the restart issue see CASSANDRA-6008<https://issues.apache.org/jira/browse/CASSANDRA-6008> and 6086
On Thu, Feb 6, 2014 at 12:19 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > Hi Robert, > > The heap, and GC are things a bit tricky to tune, > > I recently read a post about heap, explaining how heap works and how to > tune it : > http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads > > There is plenty of this kind of blogs or articles on the web. > > Be careful, tuning highly depends on your workload and hardware. You > shouldn't use configuration found on these posts as is. You need to test, > incrementally and monitor how it behave. > > If you are not able to find a solution, there is also a lot of > professionals, consultants (like Datastax or Aaron Morton) whose job is to > help with Cassandra integrations, including heap and GC tuning. > > Hope this will help somehow. > > > 2014-01-29 16:51 GMT+01:00 Robert Wille <rwi...@fold3.com>: > > Forget about what I said about there not being any load during the >> night. I forgot about my unit tests. They would have been running at this >> time and they run against this cluster. >> >> I also forgot to provide JVM information: >> >> java version "1.7.0_17" >> Java(TM) SE Runtime Environment (build 1.7.0_17-b02) >> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) >> >> Thanks >> >> Robert >> >> From: Robert Wille <rwi...@fold3.com> >> Reply-To: <user@cassandra.apache.org> >> Date: Wednesday, January 29, 2014 at 4:06 AM >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: GC taking a long time >> >> I read through the recent thread "Cassandra mad GC", which seemed very >> similar to my situation, but didn’t really help. >> >> Here is what I get from my logs when I grep for GCInspector. Note that >> this is the middle of the night on a dev server, so there should have been >> almost no load. >> >> INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 8027555576used; >> max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136used; >> max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400used; >> max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952used; >> max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784used; >> max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456used; >> max is >> 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:47:58,537 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 261 ms for 1 collections, 7650345088 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:48:10,783 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 269 ms for 1 collections, 7653016592 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:48:23,786 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 298 ms for 1 collections, 7716831032 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:48:35,988 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 308 ms for 1 collections, 7745178616 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:48:48,434 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 319 ms for 1 collections, 7796207088 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:00,902 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 320 ms for 1 collections, 7821378680 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:13,344 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 338 ms for 1 collections, 7859905288 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:25,471 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 352 ms for 1 collections, 7911145688 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:38,473 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 359 ms for 1 collections, 7938204144 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:49:50,895 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 368 ms for 1 collections, 7988088408 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:03,345 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 379 ms for 1 collections, 8014757368 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:15,920 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 384 ms for 1 collections, 8044074584 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:28,484 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 397 ms for 1 collections, 8094146216 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:41,040 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 405 ms for 1 collections, 8118675056 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:50:53,628 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 408 ms for 1 collections, 8121201008 used; >> max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:51:19,816 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 26041 ms for 1 collections, 7385748352 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 02:59:38,271 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 20970 ms for 1 collections, 7379533664 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 03:00:07,020 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 25835 ms for 1 collections, 7386103248 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 03:00:28,777 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 20278 ms for 1 collections, 7380769872 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 03:00:55,514 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 24909 ms for 1 collections, 7382846344 >> used; max is 8126464000 >> INFO [ScheduledTasks:1] 2014-01-29 03:01:23,741 GCInspector.java (line >> 116) GC for ConcurrentMarkSweep: 26223 ms for 1 collections, 7387539176 >> used; max is 8126464000 >> >> There are various long GC’s sprinkled throughout my logs. Sometimes it >> recovers from them. Sometime it goes into a death spiral. >> >> About a minute prior to the big meltdown at 02:51:19 there was a lot of >> activity on the local CF. About every 10 milliseconds the following is >> repeated: >> >> INFO [MutationStage:552] 2014-01-29 02:50:35,489 ColumnFamilyStore.java >> (line 740) Enqueuing flush of Memtable-local@1059338449(73/3672 >> serialized/live bytes, 2 ops) >> INFO [FlushWriter:45] 2014-01-29 02:50:35,489 Memtable.java (line 333) >> Writing Memtable-local@1059338449(73/3672 serialized/live bytes, 2 ops) >> INFO [FlushWriter:45] 2014-01-29 02:50:35,492 Memtable.java (line 373) >> Completed flushing >> /var/lib/cassandra/data/system/local/system-local-jb-104855-Data.db (93 >> bytes) for commitlog position ReplayPosition(segmentId=1390945513179, >> position=25932849) >> >> It seems strange to be repeatedly flushing a table at that frequency. >> Other than that, I don’t see anything odd in the logs just before GC went >> off the rails. >> >> I have not done any tuning. All my JVM settings are default. For the >> record, here’s what I get from ps: >> >> java -ea >> -javaagent:/usr/local/apache-cassandra-2.0.4/bin/../lib/jamm-0.2.5.jar >> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms7830M -Xmx7830M >> -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss256k >> -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 >> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 >> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark … >> >> I have a half-dozen keyspaces with a half-dozen tables per keyspace, and >> a total of several gigabytes of data. Jut a one-node cluster. Nothing >> really remarkable. Any tables with records that could get really wide have >> been sharded. No record should be bigger than a few MB. >> >> On an unrelated note, I had to kill my server because it wouldn’t do >> anything but GC. Now I cannot start it. It errors out with the following: >> >> java.lang.IllegalStateException: Unfinished compactions reference missing >> sstables. This should never happen since compactions are marked finished >> before we start removing the old sstables. >> at >> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:495) >> at >> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264) >> at >> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461) >> at >> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504) >> java.lang.IllegalStateException: Unfinished compactions reference missing >> sstables. This should never happen since compactions are marked finished >> before we start removing the old sstables. >> at >> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:495) >> at >> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:264) >> at >> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461) >> at >> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504) >> Exception encountered during startup: Unfinished compactions reference >> missing sstables. This should never happen since compactions are marked >> finished before we start removing the old sstables. >> >> I have a single node in my cluster. Is there any way to recover, or is my >> data toast? I can restore my data, so this isn’t a primary issue for me. >> The GC issue, on the other hand, is preventing me from moving to production. >> >> Robert >> >> >