Re: Lot of GC on two nodes out of 7

Anishek Agarwal Wed, 02 Mar 2016 01:35:21 -0800

also MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=4G.

On Wed, Mar 2, 2016 at 1:40 PM, Anishek Agarwal <anis...@gmail.com> wrote:


> Hey Jeff,
>
> one of the nodes with high GC has 1400 SST tables, all other nodes have
> about 500-900 SST tables. the other node with high GC has 636 SST tables.
>
> the average row size for compacted partitions is about 1640 bytes on all
> nodes. We have replication factor 3 but the problem is only on two nodes.
> the only other thing that stands out in cfstats is the read time and write
> time on the nodes with high GC is 5-7 times higher than other 5 nodes, but
> i think thats expected.
>
> thanks
> anishek
>
>
>
>
> On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> Compaction falling behind will likely cause additional work on reads
>> (more sstables to merge), but I’d be surprised if it manifested in super
>> long GC. When you say twice as many sstables, how many is that?.
>>
>> In cfstats, does anything stand out? Is max row size on those nodes
>> larger than on other nodes?
>>
>> What you don’t show in your JVM options is the new gen size – if you do
>> have unusually large partitions on those two nodes (especially likely if
>> you have rf=2 – if you have rf=3, then there’s probably a third node
>> misbehaving you haven’t found yet), then raising new gen size can help
>> handle the garbage created by reading large partitions without having to
>> tolerate the promotion. Estimates for the amount of garbage vary, but it
>> could be “gigabytes” of garbage on a very wide partition (see
>> https://issues.apache.org/jira/browse/CASSANDRA-9754 for work in
>> progress to help mitigate that type of pain).
>>
>> - Jeff
>>
>> From: Anishek Agarwal
>> Reply-To: "user@cassandra.apache.org"
>> Date: Tuesday, March 1, 2016 at 11:12 PM
>> To: "user@cassandra.apache.org"
>> Subject: Lot of GC on two nodes out of 7
>>
>> Hello,
>>
>> we have a cassandra cluster of 7 nodes, all of them have the same JVM GC
>> configurations, all our writes /  reads use the TokenAware Policy wrapping
>> a DCAware policy. All nodes are part of same Datacenter.
>>
>> We are seeing that two nodes are having high GC collection times. Then
>> mostly seem to spend time in GC like about 300-600 ms. This also seems to
>> result in higher CPU utilisation on these machines. Other  5 nodes don't
>> have this problem.
>>
>> There is no additional repair activity going on the cluster, we are not
>> sure why this is happening.
>> we checked cfhistograms on the two CF we have in the cluster and number
>> of reads seems to be almost same.
>>
>> we also used cfstats to see the number of ssttables on each node and one
>> of the nodes with the above problem has twice the number of ssttables than
>> other nodes. This still doesnot explain why two nodes have high GC
>> Overheads. our GC config is as below:
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>>
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>>
>> JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>>
>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"
>>
>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"
>>
>> JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>>
>> # earlier value 131072 = 32768 * 4
>>
>> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768"
>>
>> #new
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>>
>> We are using cassandra 2.0.17. If anyone has any suggestion as to how
>> what else we can look for to understand why this is happening please do
>> reply.
>>
>>
>>
>> Thanks
>> anishek
>>
>>
>>
>

Re: Lot of GC on two nodes out of 7

Reply via email to