Cassandra performance issues after introducing G1 GC

Michalis Kotsiouros (EXT) via user Thu, 11 Sep 2025 20:23:22 -0700

Hello Cassandra community,
We are using Cassandra 4.1.5 with the G1 Garbage Collection on Java 11.
We are using the default G1 settings as found in the jvm11-server.options 
delivered by Cassandra installation. Those are:
## G1 Settings
## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
-XX:InitialRAMPercentage=50.0
-Xlog:gc=info,heap*=info,age*=info,safepoint=info,promotion*=info:file=/var/log/cassandra/gc.log:time,uptime,pid,tid,level:filecount=10,filesize=10485760
-XX:MaxRAMPercentage=50.0
#-XX:+ParallelRefProcEnabled
##-XX:MaxTenuringThreshold=1
#-XX:G1HeapRegionSize=16m


#
## Have the JVM do less remembered set work during STW, instead
## preferring concurrent GC. Reduces p99.9 latency.
#-XX:G1RSetUpdatingPauseTimePercent=5
#
## Main G1GC tunable: lowering the pause target will lower throughput and vise 
versa.
## 200ms is the JVM default and lowest viable setting
## 1000ms increases throughput. Keep it smaller than the timeouts in 
cassandra.yaml.
-XX:MaxGCPauseMillis=200

## Optional G1 Settings
# Save CPU time on large (>= 16GB) heaps by delaying region scanning
# until the heap is 70% full. The default in Hotspot 8u40 is 40%.
-XX:InitiatingHeapOccupancyPercent=70

# For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number 
of logical cores.
# Otherwise equal to the number of cores when 8 or less.
# Machines with > 10 cores should try setting these to <= full cores.
#-XX:ParallelGCThreads=16
# By default, ConcGCThreads is 1/4 of ParallelGCThreads.
# Setting both to the same value can reduce STW durations.
#-XX:ConcGCThreads=16

We have observed that in the production deployment that the Cassandra 
underperforms occasionally. Based on the logs analysis, we could correlate the 
underperformance - slow operations and dropped internal messages - with 
frequent long GC pauses. Then the Cassandra node would stop due to OOM and 
after the restart, the performance would be ok for some days. After 3-4 days, 
we would see the same behavior again and the system recovering after the 
restart due to OOM.
We used to have our application using Cassandra 3.11 with CMS GC and we would 
not see such behavior.
We have been checking the gc.log and we observed that the number of humongous 
regions were reaching around 3K which is definitely not normal for G1.
After some research about the G1 Garbage collection, we tried to increase the 
region size using the -XX:G1HeapRegionSize JVM option.
We saw in the first log line of gc.log that the region size is set to 2M and 
after checking a heap dump of the Cassandra process we observed a high number 
of objects with 1M size. We concluded that using 4M is a safe option to start 
testing. Our first test from the test deployment shows a tremendous reduction 
of the number of humongous regions. That is from ~1.5K to ~20.
Has anyone else in the community observed similar issues before when using the 
G1 GC?
Do you consider that the setting of the Heap Region Size is application 
dependent or it depends on the Cassandra internal design?
If the region size setting mostly depends on the Cassandra internal design, is 
there any general recommendation that would cover the majority of applications?

BR
MK

Cassandra performance issues after introducing G1 GC

Reply via email to