Hello, we have a cassandra cluster of 7 nodes, all of them have the same JVM GC configurations, all our writes / reads use the TokenAware Policy wrapping a DCAware policy. All nodes are part of same Datacenter.
We are seeing that two nodes are having high GC collection times. Then mostly seem to spend time in GC like about 300-600 ms. This also seems to result in higher CPU utilisation on these machines. Other 5 nodes don't have this problem. There is no additional repair activity going on the cluster, we are not sure why this is happening. we checked cfhistograms on the two CF we have in the cluster and number of reads seems to be almost same. we also used cfstats to see the number of ssttables on each node and one of the nodes with the above problem has twice the number of ssttables than other nodes. This still doesnot explain why two nodes have high GC Overheads. our GC config is as below: JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC" JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled" JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8" JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50" JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70" JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" JVM_OPTS="$JVM_OPTS -XX:+UseTLAB" JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m" JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts" JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops" JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark" JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48" JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48" JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent" JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions" JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity" JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs" # earlier value 131072 = 32768 * 4 JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072" JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600" JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768" JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768" #new JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled" We are using cassandra 2.0.17. If anyone has any suggestion as to how what else we can look for to understand why this is happening please do reply. Thanks anishek