Re: Cassandra performance issues after introducing G1 GC

Aaron Wed, 10 Sep 2025 06:36:45 -0700

>
> We saw in the first log line of gc.log that the region size is set to 2M
> and after checking a heap dump of the Cassandra process we observed a high
> number of objects with 1M size. We concluded that using 4M is a safe option
> to start testing.



Isn't the default 16m? I'd be curious to know how that went from 16m to 2m,
to begin with. I see it's commented-out in what you've pasted above. With
it commented-out, perhaps it was being set to 2m implicitly? If that was
set or commented-out in error, I'd try setting it back to 16m, and see if
things improve.

On Wed, Sep 10, 2025 at 7:04 AM Julien Laurenceau <
julien.laurenc...@pepitedata.com> wrote:

> Hi
>
> I think you may have better luck using jdk17 and ZGC or shenandoah.
> As shown by datastax here :
> https://www.datastax.com/blog/apache-cassandra-benchmarking-40-brings-heat-new-garbage-collectors-zgc-and-shenandoah
>
>
> Regards
>
>
>
> Le Mercredi, Septembre 10, 2025 13:50 CEST, "Michalis Kotsiouros (EXT) via
> user" <user@cassandra.apache.org> a écrit:
>
>
>
> Hello Cassandra community,
>
> We are using Cassandra 4.1.5 with the G1 Garbage Collection on Java 11.
>
> We are using the default G1 settings as found in the jvm11-server.options
> delivered by Cassandra installation. Those are:
>
> ## G1 Settings
>
> ## Use the Hotspot garbage-first collector.
>
> -XX:+UseG1GC
>
> -XX:InitialRAMPercentage=50.0
>
>
> -Xlog:gc=info,heap*=info,age*=info,safepoint=info,promotion*=info:file=/var/log/cassandra/gc.log:time,uptime,pid,tid,level:filecount=10,filesize=10485760
>
> -XX:MaxRAMPercentage=50.0
>
> #-XX:+ParallelRefProcEnabled
>
> ##-XX:MaxTenuringThreshold=1
>
> #-XX:G1HeapRegionSize=16m
>
>
>
> #
>
> ## Have the JVM do less remembered set work during STW, instead
>
> ## preferring concurrent GC. Reduces p99.9 latency.
>
> #-XX:G1RSetUpdatingPauseTimePercent=5
>
> #
>
> ## Main G1GC tunable: lowering the pause target will lower throughput and
> vise versa.
>
> ## 200ms is the JVM default and lowest viable setting
>
> ## 1000ms increases throughput. Keep it smaller than the timeouts in
> cassandra.yaml.
>
> -XX:MaxGCPauseMillis=200
>
>
>
> ## Optional G1 Settings
>
> # Save CPU time on large (>= 16GB) heaps by delaying region scanning
>
> # until the heap is 70% full. The default in Hotspot 8u40 is 40%.
>
> -XX:InitiatingHeapOccupancyPercent=70
>
>
>
> # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the
> number of logical cores.
>
> # Otherwise equal to the number of cores when 8 or less.
>
> # Machines with > 10 cores should try setting these to <= full cores.
>
> #-XX:ParallelGCThreads=16
>
> # By default, ConcGCThreads is 1/4 of ParallelGCThreads.
>
> # Setting both to the same value can reduce STW durations.
>
> #-XX:ConcGCThreads=16
>
>
>
> We have observed that in the production deployment that the Cassandra
> underperforms occasionally. Based on the logs analysis, we could correlate
> the underperformance – slow operations and dropped internal messages – with
> frequent long GC pauses. Then the Cassandra node would stop due to OOM and
> after the restart, the performance would be ok for some days. After 3-4
> days, we would see the same behavior again and the system recovering after
> the restart due to OOM.
>
> We used to have our application using Cassandra 3.11 with CMS GC and we
> would not see such behavior.
>
> We have been checking the gc.log and we observed that the number of
> humongous regions were reaching around 3K which is definitely not normal
> for G1.
>
> After some research about the G1 Garbage collection, we tried to increase
> the region size using the -XX:G1HeapRegionSize JVM option.
>
> We saw in the first log line of gc.log that the region size is set to 2M
> and after checking a heap dump of the Cassandra process we observed a high
> number of objects with 1M size. We concluded that using 4M is a safe option
> to start testing. Our first test from the test deployment shows a
> tremendous reduction of the number of humongous regions. That is from ~1.5K
> to ~20.
>
> Has anyone else in the community observed similar issues before when using
> the G1 GC?
>
> Do you consider that the setting of the Heap Region Size is application
> dependent or it depends on the Cassandra internal design?
>
> If the region size setting mostly depends on the Cassandra internal
> design, is there any general recommendation that would cover the majority
> of applications?
>
>
>
> BR
>
> MK
>
>
>
>
>
>
>
>

Re: Cassandra performance issues after introducing G1 GC

Reply via email to