Re: Cassandra performance issues after introducing G1 GC

Aaron Thu, 18 Sep 2025 10:38:24 -0700

You're welcome, Michalis! Let us know if you need anything else.

On Wed, Sep 10, 2025 at 10:56 AM Michalis Kotsiouros (EXT) via user <
user@cassandra.apache.org> wrote:


> Hello Aaron,
>
> Yes indeed, you are right. I will check by using the values as found in
> the jvm11-server.options.
>
> From my lab node I would that increasing the region size to 4m reduced the
> number of humongous regions.
>
> Maybe, it is safer to go for 16m in the production since this will allow
> even larger objects to be treated better by the GC.
>
> I will update you with the outcome from the production node.
>
>
>
> Thanks a lot for your valuable replies!
>
>
>
> BR
>
> MK
>
>
>
> *From:* Aaron <aaronplo...@gmail.com>
> *Sent:* September 10, 2025 18:13
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra performance issues after introducing G1 GC
>
>
>
> When enabling G1GC in the JVM(11) options file, the expectation is to
> comment-out the CMS-related lines and un-comment the G1GC-related lines
> (under the "###G1 Settings" header). This would enable the following
> settings for G1GC:
>
>
>
> -XX:+UseG1GC
> -XX:+ParallelRefProcEnabled
> -XX:MaxTenuringThreshold=1
> -XX:G1HeapRegionSize=16m
> -XX:G1RSetUpdatingPauseTimePercent=5
> -XX:MaxGCPauseMillis=300
>
>
>
> This is a solid, recommended starting point for most workloads.
>
>
>
> any previous experience by the user community.
>
>
>
> Truth be told, I've configured dozens of Cassandra clusters running with
> G1GC, and most did not require these values to be significantly
> altered...maybe MaxGCPauseMillis, depending on the workload.
>
>
>
> In any case, if G1HeapRegionSize is a *fraction* of what we recommend,
> setting that to 16m is where I would start. It also couldn't hurt to
> check the remaining settings for any significant deviations from the above.
>
>
>
> On Wed, Sep 10, 2025 at 9:40 AM Michalis Kotsiouros (EXT) via user <
> user@cassandra.apache.org> wrote:
>
> Hello Aaron,
>
> No, the -XX:G1HeapRegionSize=16m
>
> that is present in the default jvm11-server.options file that is delivered
> by Cassandra installation is already commented out.
>
> That is why I am asking if there is any previous experience by the user
> community.
>
> Based on the G1 GC documentation if this parameter is not set then it is
> determined automatically during the JVM startup. I do not know how though.
> Based on the gc.log, the current value used is 2m.
>
>
>
> BR
>
> MK
>
>
>
> *From:* Aaron <aaronplo...@gmail.com>
> *Sent:* September 10, 2025 16:36
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra performance issues after introducing G1 GC
>
>
>
> We saw in the first log line of gc.log that the region size is set to 2M
> and after checking a heap dump of the Cassandra process we observed a high
> number of objects with 1M size. We concluded that using 4M is a safe option
> to start testing.
>
>
>
> Isn't the default 16m? I'd be curious to know how that went from 16m to
> 2m, to begin with. I see it's commented-out in what you've pasted above.
> With it commented-out, perhaps it was being set to 2m implicitly? If that
> was set or commented-out in error, I'd try setting it back to 16m, and see
> if things improve.
>
>
>
> On Wed, Sep 10, 2025 at 7:04 AM Julien Laurenceau <
> julien.laurenc...@pepitedata.com> wrote:
>
> Hi
>
> I think you may have better luck using jdk17 and ZGC or shenandoah.
> As shown by datastax here :
> https://www.datastax.com/blog/apache-cassandra-benchmarking-40-brings-heat-new-garbage-collectors-zgc-and-shenandoah
>
>
> Regards
>
>
>
> Le Mercredi, Septembre 10, 2025 13:50 CEST, "Michalis Kotsiouros (EXT) via
> user" <user@cassandra.apache.org> a écrit:
>
>
>
> Hello Cassandra community,
>
> We are using Cassandra 4.1.5 with the G1 Garbage Collection on Java 11.
>
> We are using the default G1 settings as found in the jvm11-server.options
> delivered by Cassandra installation. Those are:
>
> ## G1 Settings
>
> ## Use the Hotspot garbage-first collector.
>
> -XX:+UseG1GC
>
> -XX:InitialRAMPercentage=50.0
>
>
> -Xlog:gc=info,heap*=info,age*=info,safepoint=info,promotion*=info:file=/var/log/cassandra/gc.log:time,uptime,pid,tid,level:filecount=10,filesize=10485760
>
> -XX:MaxRAMPercentage=50.0
>
> #-XX:+ParallelRefProcEnabled
>
> ##-XX:MaxTenuringThreshold=1
>
> #-XX:G1HeapRegionSize=16m
>
>
>
> #
>
> ## Have the JVM do less remembered set work during STW, instead
>
> ## preferring concurrent GC. Reduces p99.9 latency.
>
> #-XX:G1RSetUpdatingPauseTimePercent=5
>
> #
>
> ## Main G1GC tunable: lowering the pause target will lower throughput and
> vise versa.
>
> ## 200ms is the JVM default and lowest viable setting
>
> ## 1000ms increases throughput. Keep it smaller than the timeouts in
> cassandra.yaml.
>
> -XX:MaxGCPauseMillis=200
>
>
>
> ## Optional G1 Settings
>
> # Save CPU time on large (>= 16GB) heaps by delaying region scanning
>
> # until the heap is 70% full. The default in Hotspot 8u40 is 40%.
>
> -XX:InitiatingHeapOccupancyPercent=70
>
>
>
> # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the
> number of logical cores.
>
> # Otherwise equal to the number of cores when 8 or less.
>
> # Machines with > 10 cores should try setting these to <= full cores.
>
> #-XX:ParallelGCThreads=16
>
> # By default, ConcGCThreads is 1/4 of ParallelGCThreads.
>
> # Setting both to the same value can reduce STW durations.
>
> #-XX:ConcGCThreads=16
>
>
>
> We have observed that in the production deployment that the Cassandra
> underperforms occasionally. Based on the logs analysis, we could correlate
> the underperformance – slow operations and dropped internal messages – with
> frequent long GC pauses. Then the Cassandra node would stop due to OOM and
> after the restart, the performance would be ok for some days. After 3-4
> days, we would see the same behavior again and the system recovering after
> the restart due to OOM.
>
> We used to have our application using Cassandra 3.11 with CMS GC and we
> would not see such behavior.
>
> We have been checking the gc.log and we observed that the number of
> humongous regions were reaching around 3K which is definitely not normal
> for G1.
>
> After some research about the G1 Garbage collection, we tried to increase
> the region size using the -XX:G1HeapRegionSize JVM option.
>
> We saw in the first log line of gc.log that the region size is set to 2M
> and after checking a heap dump of the Cassandra process we observed a high
> number of objects with 1M size. We concluded that using 4M is a safe option
> to start testing. Our first test from the test deployment shows a
> tremendous reduction of the number of humongous regions. That is from ~1.5K
> to ~20.
>
> Has anyone else in the community observed similar issues before when using
> the G1 GC?
>
> Do you consider that the setting of the Heap Region Size is application
> dependent or it depends on the Cassandra internal design?
>
> If the region size setting mostly depends on the Cassandra internal
> design, is there any general recommendation that would cover the majority
> of applications?
>
>
>
> BR
>
> MK
>
>
>
>
>
>
>
>
>

Re: Cassandra performance issues after introducing G1 GC

Reply via email to