Hi, my previously mentioned G1 bug does not seem to be related to your case
Thomas From: Gustavo Scudeler [mailto:scudel...@gmail.com] Sent: Montag, 09. Oktober 2017 15:13 To: user@cassandra.apache.org Subject: Re: Cassandra and G1 Garbage collector stop the world event (STW) Hello, @kurt greaves: Have you tried CMS with that sized heap? Yes, for testing for testing purposes, I have 3 nodes with CMS and 3 with G1. The behavior is basically the same. Using CMS suggested settings http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk= Using G1 suggested settings http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3 @Steinmaurer, Thomas If this happens in a very short very frequently and depending on your allocation rate in MB/s, a combination of the G1 bug and a small heap, might result going towards OOM. We have a really high obj allocation rate: Avg creation rate 622.9 mb/sec Avg promotion rate 18.39 mb/sec It could be the cause, where the GC can't keep up with this rate. I'm stating to think it could be some wrong configuration where Cassandra is configured in a way that bursts allocations in a manner that G1 can't keep up with. Any ideas? Best regards, 2017-10-09 12:44 GMT+01:00 Steinmaurer, Thomas <thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>>: Hi, although not happening here with Cassandra (due to using CMS), we had some weird problem with our server application e.g. hit by the following JVM/G1 bugs: https://bugs.openjdk.java.net/browse/JDK-8140597 https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less a duplicate of above) https://bugs.openjdk.java.net/browse/JDK-8048556 Especially the first, JDK-8140597, might be interesting, if you see periodic humongous allocations (according to a GC log) resulting in mixed GC phases being steadily interrupted due to G1 bug, thus no GC in OLD regions. Humongous allocations will happen if a single (?) allocation is > (region size / 2), if I remember correctly. Can’t recall the default G1 region size for a 12GB heap, but possibly 4MB. So, in case you are allocating something larger than > 2MB, you might end up in something called “humongous” allocations, spanning several G1 regions. If this happens in a very short very frequently and depending on your allocation rate in MB/s, a combination of the G1 bug and a small heap, might result going towards OOM. Possibly worth a further route for investigation. Regards, Thomas From: Gustavo Scudeler [mailto:scudel...@gmail.com<mailto:scudel...@gmail.com>] Sent: Montag, 09. Oktober 2017 13:12 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Cassandra and G1 Garbage collector stop the world event (STW) Hi guys, We have a 6 node Cassandra Cluster under heavy utilization. We have been dealing a lot with garbage collector stop the world event, which can take up to 50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not even accepting new logins. Extra details: • Cassandra Version: 3.11 • Heap Size = 12 GB • We are using G1 Garbage Collector with default settings • Nodes size: 4 CPUs 28 GB RAM • All CPU cores are at 100% all the time. • The G1 GC behavior is the same across all nodes. The behavior remains basically: 1. Old Gen starts to fill up. 2. GC can't clean it properly without a full GC and a STW event. 3. The full GC starts to take longer, until the node is completely unresponsive. Extra details and GC reports: https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw Can someone point me what configurations or events I could check? Thanks! Best regards, The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g> The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313