Re: Lot of GC on two nodes out of 7

Alain RODRIGUEZ Wed, 02 Mar 2016 02:52:15 -0800

Hi Anishek,

Even if it highly depends on your workload, here are my thoughts:

`HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
`MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You
might want to keep this as it or even reduce it if you have less than 16 GB
of native memory. Go with 8 GB if you have a lot of memory.
`-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so
far. I had luck with values between 4 <--> 16 in the past. I would give  a
try with 15.
`-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ?
Using default and then tune from there to improve things is generally a
good idea.

You also use a bunch of option I don't know about, if you are uncertain
about them, you could try a default conf without the options you added and
just the using the changes above from default
https://github.com/apache/cassandra/blob/cassandra-2.0/conf/cassandra-env.sh.
Or you might find more useful information on a nice reference about this
topic which is Al Tobey's blog post about tuning 2.1. Go to the 'Java
Virtual Machine' part:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

FWIW, I also saw improvement in the past by upgrading to 2.1, Java 8 and
G1GC. G1GC is supposed to be easier to configure too.

the average row size for compacted partitions is about 1640 bytes on all
> nodes. We have replication factor 3 but the problem is only on two nodes.
>

I think Jeff is trying to spot a wide row messing with your system, so
looking at the max row size on those nodes compared to other is more
relevant than average size for this check.

the only other thing that stands out in cfstats is the read time and write
> time on the nodes with high GC is 5-7 times higher than other 5 nodes, but
> i think thats expected.

I would probably look at this the reverse way: I imagine that extra GC  is
a consequence of something going wrong on those nodes as JVM / GC are
configured the same way cluster-wide. GC / JVM issues are often due to
Cassandra / system / hardware issues, inducing extra pressure on the JVM. I
would try to tune JVM / GC only once the system is healthy. So I often saw
high GC being a consequence rather than the root cause of an issue.

To explore this possibility:

Does this command show some dropped or blocked tasks? This would add
pressure to heap.
nodetool tpstats

Do you have errors in logs? Always good to know when facing an issue.
grep -i "ERROR" /var/log/cassandra/system.log

How are compactions tuned (throughput + concurrent compactors)? This tuning
might explain compactions not keeping up or a high GC pressure.

What are your disks / CPU? To help us giving you good arbitrary values to
try.

Is there some iowait ? Could point to a bottleneck or bad hardware.
iostats -mx 5 100

...

Hope one of those will point you to an issue, but there are many more thing
you could check.

Let us know how it goes,

C*heers,
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-02 10:33 GMT+01:00 Anishek Agarwal <anis...@gmail.com>:

> also MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=4G.
>
> On Wed, Mar 2, 2016 at 1:40 PM, Anishek Agarwal <anis...@gmail.com> wrote:
>
>> Hey Jeff,
>>
>> one of the nodes with high GC has 1400 SST tables, all other nodes have
>> about 500-900 SST tables. the other node with high GC has 636 SST tables.
>>
>> the average row size for compacted partitions is about 1640 bytes on all
>> nodes. We have replication factor 3 but the problem is only on two nodes.
>> the only other thing that stands out in cfstats is the read time and
>> write time on the nodes with high GC is 5-7 times higher than other 5
>> nodes, but i think thats expected.
>>
>> thanks
>> anishek
>>
>>
>>
>>
>> On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>>> Compaction falling behind will likely cause additional work on reads
>>> (more sstables to merge), but I’d be surprised if it manifested in super
>>> long GC. When you say twice as many sstables, how many is that?.
>>>
>>> In cfstats, does anything stand out? Is max row size on those nodes
>>> larger than on other nodes?
>>>
>>> What you don’t show in your JVM options is the new gen size – if you do
>>> have unusually large partitions on those two nodes (especially likely if
>>> you have rf=2 – if you have rf=3, then there’s probably a third node
>>> misbehaving you haven’t found yet), then raising new gen size can help
>>> handle the garbage created by reading large partitions without having to
>>> tolerate the promotion. Estimates for the amount of garbage vary, but it
>>> could be “gigabytes” of garbage on a very wide partition (see
>>> https://issues.apache.org/jira/browse/CASSANDRA-9754 for work in
>>> progress to help mitigate that type of pain).
>>>
>>> - Jeff
>>>
>>> From: Anishek Agarwal
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Tuesday, March 1, 2016 at 11:12 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: Lot of GC on two nodes out of 7
>>>
>>> Hello,
>>>
>>> we have a cassandra cluster of 7 nodes, all of them have the same JVM GC
>>> configurations, all our writes /  reads use the TokenAware Policy wrapping
>>> a DCAware policy. All nodes are part of same Datacenter.
>>>
>>> We are seeing that two nodes are having high GC collection times. Then
>>> mostly seem to spend time in GC like about 300-600 ms. This also seems to
>>> result in higher CPU utilisation on these machines. Other  5 nodes don't
>>> have this problem.
>>>
>>> There is no additional repair activity going on the cluster, we are not
>>> sure why this is happening.
>>> we checked cfhistograms on the two CF we have in the cluster and number
>>> of reads seems to be almost same.
>>>
>>> we also used cfstats to see the number of ssttables on each node and one
>>> of the nodes with the above problem has twice the number of ssttables than
>>> other nodes. This still doesnot explain why two nodes have high GC
>>> Overheads. our GC config is as below:
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>>>
>>> # earlier value 131072 = 32768 * 4
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768"
>>>
>>> #new
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>>>
>>> We are using cassandra 2.0.17. If anyone has any suggestion as to how
>>> what else we can look for to understand why this is happening please do
>>> reply.
>>>
>>>
>>>
>>> Thanks
>>> anishek
>>>
>>>
>>>
>>
>

Re: Lot of GC on two nodes out of 7

Reply via email to