also MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=4G. On Wed, Mar 2, 2016 at 1:40 PM, Anishek Agarwal <anis...@gmail.com> wrote:
> Hey Jeff, > > one of the nodes with high GC has 1400 SST tables, all other nodes have > about 500-900 SST tables. the other node with high GC has 636 SST tables. > > the average row size for compacted partitions is about 1640 bytes on all > nodes. We have replication factor 3 but the problem is only on two nodes. > the only other thing that stands out in cfstats is the read time and write > time on the nodes with high GC is 5-7 times higher than other 5 nodes, but > i think thats expected. > > thanks > anishek > > > > > On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> > wrote: > >> Compaction falling behind will likely cause additional work on reads >> (more sstables to merge), but I’d be surprised if it manifested in super >> long GC. When you say twice as many sstables, how many is that?. >> >> In cfstats, does anything stand out? Is max row size on those nodes >> larger than on other nodes? >> >> What you don’t show in your JVM options is the new gen size – if you do >> have unusually large partitions on those two nodes (especially likely if >> you have rf=2 – if you have rf=3, then there’s probably a third node >> misbehaving you haven’t found yet), then raising new gen size can help >> handle the garbage created by reading large partitions without having to >> tolerate the promotion. Estimates for the amount of garbage vary, but it >> could be “gigabytes” of garbage on a very wide partition (see >> https://issues.apache.org/jira/browse/CASSANDRA-9754 for work in >> progress to help mitigate that type of pain). >> >> - Jeff >> >> From: Anishek Agarwal >> Reply-To: "user@cassandra.apache.org" >> Date: Tuesday, March 1, 2016 at 11:12 PM >> To: "user@cassandra.apache.org" >> Subject: Lot of GC on two nodes out of 7 >> >> Hello, >> >> we have a cassandra cluster of 7 nodes, all of them have the same JVM GC >> configurations, all our writes / reads use the TokenAware Policy wrapping >> a DCAware policy. All nodes are part of same Datacenter. >> >> We are seeing that two nodes are having high GC collection times. Then >> mostly seem to spend time in GC like about 300-600 ms. This also seems to >> result in higher CPU utilisation on these machines. Other 5 nodes don't >> have this problem. >> >> There is no additional repair activity going on the cluster, we are not >> sure why this is happening. >> we checked cfhistograms on the two CF we have in the cluster and number >> of reads seems to be almost same. >> >> we also used cfstats to see the number of ssttables on each node and one >> of the nodes with the above problem has twice the number of ssttables than >> other nodes. This still doesnot explain why two nodes have high GC >> Overheads. our GC config is as below: >> >> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC" >> >> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" >> >> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled" >> >> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8" >> >> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50" >> >> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70" >> >> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" >> >> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB" >> >> JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m" >> >> JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts" >> >> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops" >> >> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark" >> >> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48" >> >> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48" >> >> JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent" >> >> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions" >> >> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity" >> >> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs" >> >> # earlier value 131072 = 32768 * 4 >> >> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072" >> >> JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600" >> >> JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768" >> >> JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768" >> >> #new >> >> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled" >> >> We are using cassandra 2.0.17. If anyone has any suggestion as to how >> what else we can look for to understand why this is happening please do >> reply. >> >> >> >> Thanks >> anishek >> >> >> >