> Our JVM options are unchanged between 2.2 and 3.11 >> > > For the sake of clarity, do you mean: > (a) you're using the default JVM options in 3.11 and it's different to the > options you had in 2.2? > (b) you've copied the same JVM options you had in 2.2 to 3.11? >
(b), which are the default options from 2.2 (and I believe the default options in 3.11 from a brief glance). Copied here for clarity, though I'm skeptical that GC settings are actually a cause here because I would expect them to only impact the upgraded node and not the cluster overall. ### CMS Settings -XX:+UseParNewGC XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways XX:+CMSClassUnloadingEnabled > The distinction is important because at the moment, you need to go through > a process of elimination to identify the cause. > > >> Read throughput (rate, bytes read/range scanned, etc.) seems fairly >> consistent before and after the upgrade across all nodes. >> > > What I was trying to get at is whether the upgraded node was getting hit > with more traffic compared to the other nodes since it will indicate that > the longer GCs are just the symptom, not the cause. > > I don't see any distinct change, nor do I see an increase in traffic to the upgraded node that would result in longer GC pauses. Frankly I don't see any changes or aberrations in client-related metrics at all that correlate to the GC pauses, except for the corresponding timeouts.