Re: Cassandra and G1 Garbage collector stop the world event (STW)

Chris Lohfink Mon, 09 Oct 2017 08:04:45 -0700

Can you share your schema and cfstats? This sounds kinda like a wide
partition, backed up compactions, or tombstone issue for it to create so
much and have issues like that so quickly with those settings.


A heap dump would be most telling but they are rather large and hard to
share.

Chris

On Mon, Oct 9, 2017 at 8:12 AM, Gustavo Scudeler <scudel...@gmail.com>
wrote:

> Hello,
>
> @kurt greaves: Have you tried CMS with that sized heap?
>
>
> Yes, for testing for testing purposes, I have 3 nodes with CMS and 3 with
> G1. The behavior is basically the same.
>
> *Using CMS suggested settings* http://gceasy.io/my-gc-report.jsp?p=
> c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk=
>
> *Using G1 suggested settings* http://gceasy.io/my-gc-report.jsp?p=
> c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3
>
>
> @Steinmaurer, Thomas If this happens in a very short very frequently and
>> depending on your allocation rate in MB/s, a combination of the G1 bug and
>> a small heap, might result going towards OOM.
>
>
> We have a really high obj allocation rate:
>
> Avg creation rate  622.9 mb/sec
> Avg promotion rate  18.39 mb/sec
>
> It could be the cause, where the GC can't keep up with this rate.
>
> I'm stating to think it could be some wrong configuration where Cassandra is
> configured in a way that bursts allocations in a manner that G1 can't keep
> up with.
>
> Any ideas?
>
> Best regards,
>
>
> 2017-10-09 12:44 GMT+01:00 Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com>:
>
>> Hi,
>>
>>
>>
>> although not happening here with Cassandra (due to using CMS), we had
>> some weird problem with our server application e.g. hit by the following
>> JVM/G1 bugs:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8140597
>>
>> https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less  a
>> duplicate of above)
>>
>> https://bugs.openjdk.java.net/browse/JDK-8048556
>>
>>
>>
>> Especially the first, JDK-8140597, might be interesting, if you see
>> periodic humongous allocations (according to a GC log) resulting in mixed
>> GC phases being steadily interrupted due to G1 bug, thus no GC in OLD
>> regions. Humongous allocations will happen if a single (?) allocation is >
>> (region size / 2), if I remember correctly. Can’t recall the default G1
>> region size for a 12GB heap, but possibly 4MB. So, in case you are
>> allocating something larger than > 2MB, you might end up in something
>> called “humongous” allocations, spanning several G1 regions. If this
>> happens in a very short very frequently and depending on your allocation
>> rate in MB/s, a combination of the G1 bug and a small heap, might result
>> going towards OOM.
>>
>>
>>
>> Possibly worth a further route for investigation.
>>
>>
>>
>> Regards,
>>
>> Thomas
>>
>>
>>
>> *From:* Gustavo Scudeler [mailto:scudel...@gmail.com]
>> *Sent:* Montag, 09. Oktober 2017 13:12
>> *To:* user@cassandra.apache.org
>> *Subject:* Cassandra and G1 Garbage collector stop the world event (STW)
>>
>>
>>
>> Hi guys,
>>
>> We have a 6 node Cassandra Cluster under heavy utilization. We have been
>> dealing a lot with garbage collector stop the world event, which can take
>> up to 50 seconds in our nodes, in the meantime Cassandra Node is
>> unresponsive, not even accepting new logins.
>>
>> Extra details:
>>
>> ·         Cassandra Version: 3.11
>>
>> ·         Heap Size = 12 GB
>>
>> ·         We are using G1 Garbage Collector with default settings
>>
>> ·         Nodes size: 4 CPUs 28 GB RAM
>>
>> ·         All CPU cores are at 100% all the time.
>>
>> ·         The G1 GC behavior is the same across all nodes.
>>
>> The behavior remains basically:
>>
>> 1.      Old Gen starts to fill up.
>>
>> 2.      GC can't clean it properly without a full GC and a STW event.
>>
>> 3.      The full GC starts to take longer, until the node is completely
>> unresponsive.
>>
>> *Extra details and GC reports:*
>>
>> https://stackoverflow.com/questions/46568777/cassandra-and-
>> g1-garbage-collector-stop-the-world-event-stw
>>
>>
>>
>> Can someone point me what configurations or events I could check?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Best regards,
>>
>>
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>
>
>
>
>

Re: Cassandra and G1 Garbage collector stop the world event (STW)

Reply via email to