Can you share your schema and cfstats? This sounds kinda like a wide partition, backed up compactions, or tombstone issue for it to create so much and have issues like that so quickly with those settings.
A heap dump would be most telling but they are rather large and hard to share. Chris On Mon, Oct 9, 2017 at 8:12 AM, Gustavo Scudeler <scudel...@gmail.com> wrote: > Hello, > > @kurt greaves: Have you tried CMS with that sized heap? > > > Yes, for testing for testing purposes, I have 3 nodes with CMS and 3 with > G1. The behavior is basically the same. > > *Using CMS suggested settings* http://gceasy.io/my-gc-report.jsp?p= > c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk= > > *Using G1 suggested settings* http://gceasy.io/my-gc-report.jsp?p= > c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3 > > > @Steinmaurer, Thomas If this happens in a very short very frequently and >> depending on your allocation rate in MB/s, a combination of the G1 bug and >> a small heap, might result going towards OOM. > > > We have a really high obj allocation rate: > > Avg creation rate 622.9 mb/sec > Avg promotion rate 18.39 mb/sec > > It could be the cause, where the GC can't keep up with this rate. > > I'm stating to think it could be some wrong configuration where Cassandra is > configured in a way that bursts allocations in a manner that G1 can't keep > up with. > > Any ideas? > > Best regards, > > > 2017-10-09 12:44 GMT+01:00 Steinmaurer, Thomas < > thomas.steinmau...@dynatrace.com>: > >> Hi, >> >> >> >> although not happening here with Cassandra (due to using CMS), we had >> some weird problem with our server application e.g. hit by the following >> JVM/G1 bugs: >> >> https://bugs.openjdk.java.net/browse/JDK-8140597 >> >> https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less a >> duplicate of above) >> >> https://bugs.openjdk.java.net/browse/JDK-8048556 >> >> >> >> Especially the first, JDK-8140597, might be interesting, if you see >> periodic humongous allocations (according to a GC log) resulting in mixed >> GC phases being steadily interrupted due to G1 bug, thus no GC in OLD >> regions. Humongous allocations will happen if a single (?) allocation is > >> (region size / 2), if I remember correctly. Can’t recall the default G1 >> region size for a 12GB heap, but possibly 4MB. So, in case you are >> allocating something larger than > 2MB, you might end up in something >> called “humongous” allocations, spanning several G1 regions. If this >> happens in a very short very frequently and depending on your allocation >> rate in MB/s, a combination of the G1 bug and a small heap, might result >> going towards OOM. >> >> >> >> Possibly worth a further route for investigation. >> >> >> >> Regards, >> >> Thomas >> >> >> >> *From:* Gustavo Scudeler [mailto:scudel...@gmail.com] >> *Sent:* Montag, 09. Oktober 2017 13:12 >> *To:* user@cassandra.apache.org >> *Subject:* Cassandra and G1 Garbage collector stop the world event (STW) >> >> >> >> Hi guys, >> >> We have a 6 node Cassandra Cluster under heavy utilization. We have been >> dealing a lot with garbage collector stop the world event, which can take >> up to 50 seconds in our nodes, in the meantime Cassandra Node is >> unresponsive, not even accepting new logins. >> >> Extra details: >> >> · Cassandra Version: 3.11 >> >> · Heap Size = 12 GB >> >> · We are using G1 Garbage Collector with default settings >> >> · Nodes size: 4 CPUs 28 GB RAM >> >> · All CPU cores are at 100% all the time. >> >> · The G1 GC behavior is the same across all nodes. >> >> The behavior remains basically: >> >> 1. Old Gen starts to fill up. >> >> 2. GC can't clean it properly without a full GC and a STW event. >> >> 3. The full GC starts to take longer, until the node is completely >> unresponsive. >> >> *Extra details and GC reports:* >> >> https://stackoverflow.com/questions/46568777/cassandra-and- >> g1-garbage-collector-stop-the-world-event-stw >> >> >> >> Can someone point me what configurations or events I could check? >> >> >> >> Thanks! >> >> >> >> Best regards, >> >> >> The contents of this e-mail are intended for the named addressee only. It >> contains information that may be confidential. Unless you are the named >> addressee or an authorized designee, you may not copy or use it, or >> disclose it to anyone else. If you received it in error please notify us >> immediately and then destroy it. Dynatrace Austria GmbH (registration >> number FN 91482h) is a company registered in Linz whose registered office >> is at 4040 Linz, Austria, Freistädterstraße 313 >> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g> >> > > > >