I would strongly suggest you consider an upgrade to 3.11.x. I found it decreased space needed by about 30% in addition to significantly lowering GC.
As a first step, though, why not just revert to CMS for now if that was working ok for you? Then you can convert one host for diagnosis/tuning so the cluster as a whole stays functional. That's also a pretty old version of the JDK to be using G1. I would definitely upgrade that to 1.8u202 and see if the problem goes away. On Sun, Feb 10, 2019, 10:22 PM Rajsekhar Mallick <raj.mallic...@gmail.com wrote: > Hello Team, > > I have a cluster of 17 nodes in production.(8 and 9 nodes in 2 DC). > Cassandra version: 2.0.11 > Client connecting using thrift over port 9160 > Jdk version : 1.8.066 > GC used : G1GC (16GB heap) > Other GC settings: > Maxgcpausemillis=200 > Parallels gc threads=32 > Concurrent gc threads= 10 > Initiatingheapoccupancypercent=50 > Number of cpu cores for each system : 40 > Memory size: 185 GB > Read/sec : 300 /sec on each node > Writes/sec : 300/sec on each node > Compaction strategy used : Size tiered compaction strategy > > Identified issues in the cluster: > 1. Disk space usage across all nodes in the cluster is 80%. We are > currently working on adding more storage on each node > 2. There are 2 tables for which we keep on seeing large number of > tombstones. One of table has read requests seeing 120 tombstones cells in > last 5 mins as compared to 4 live cells. Tombstone warns and Error messages > of query getting aborted is also seen. > > Current issue sen: > 1. We keep on seeing GC pauses of few minutes randomly across nodes in the > cluster. GC pauses of 120 seconds, even 770 seconds are also seen. > 2. This leads to nodes getting stalled and client seeing direct impact > 3. The GC pause we see, are not during any of G1GC phases. The GC log > message prints “Time to stop threads took 770 seconds”. So it is not the > garbage collector doing any work but stopping the threads at a safe point > is taking so much of time. > 4. This issue has surfaced recently after we changed 8GB(CMS) to > 16GB(G1GC) across all nodes in the cluster. > > Kindly do help on the above issue. I am not able to exactly understand if > the GC is wrongly tuned, other if this is something else. > > Thanks, > Rajsekhar Mallick > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >