Re: Optimal Heap Size Cassandra Configuration

Alain RODRIGUEZ Tue, 21 May 2019 02:13:07 -0700

Hello,

I completely agree with Elliott above on the observation that it is hard to
say what *this cluster *needs. Yet, my colleague Jon wrote a small guide on
how to tune this in most cases or as a starting point let's say.
We often write post were we see repetitive question in the mailing list,
this gives us hints on what are good topics to cover writing a post (not to
repeat here every day the same :)). This was most definitely one of the
most 'demanded' topic and I believe this might be really helpful to you
https://thelastpickle.com/blog/2018/04/11/gc-tuning.html.

No one starts working on garbage collection unless they have to, because
it's for many a different world they don't know about. It was my case, I
did not touch GC for 2 years when I started. But really, you'll see that we
can reason about GC and make things way better in some case. The first time
I changed GC, I divided the cluster size by 2 and still divided latency by
2. So improvement were substantial, it was worth the interest we put into
it.

In addition, to break the ice with GC, I found that using http://gceasy.io
to be an excellent way to monitor/troubleshoot GC. Feed it with some gc
logs and it will give you the GC throughput (% of time JVM is available -
not doing a 'stop the world' pause). To give you some numbers, this should
be > 95-98% minimum. If you are having a lower throughput, chances are
hight that you can 'easily' improve performances there.

There is a lot more details in this analysis that might help you making
your head around GC and tune it properly.

I generally prefer using CMS, but saw some very successful cluster using
G1GC. G1GC is known to work better with bigger heaps. If you're going to
use 8 GB  (or even 16 GB) for the heap, I would stick to CMS and tune it
properly most probably, but again, G1GC might work quite well, with way
less efforts if you can assign 16+GB to the heap let's say.

Work on a canary node (only one random node) while changing this, then
observe logs with GCeasy. Repeat until you're happy with it (I would be
happy with about 95% to 98% of GC throughput (ie 2 to 5 % of pauses). But
what really matters is that after the changes you have better latency/less
dropped messages etc. You can measure the impact in GC throughput. When the
workload seems to be optimised enough / you're tired of playing with GC,
you can apply changes everywhere and observe impact on the cluster
(latency/dropped messages/CPU load...)

Hope that helps and completes somewhat Elliott's excellent answer.
-----------------------
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le lun. 20 mai 2019 à 23:31, Elliott Sims <elli...@backblaze.com> a écrit :

> It's not really something that can be easily calculated based on write
> rate, but more something you have to find empirically and adjust
> periodically.
> Generally speaking, I'd start by running "nodetool gcstats" or similar and
> just see what the GC pause stats look like.  If it's not pausing much or
> for long, you're good.  If it is, you'll likely need to do some tuning
> based on GC logging which may involve increasing the heap but could also
> mean decreasing it or changing the collection strategy.
>
> Generally speaking, with G1GC you can get away with just setting a larger
> heap than you really need and it's close enough to optimal.  CMS is
> theoretically more efficient, but far more complex to get tuned properly
> and tends to fail more dramatically.
>
> On Mon, May 20, 2019 at 7:38 AM Akshay Bhardwaj <
> akshay.bhardwaj1...@gmail.com> wrote:
>
>> Hi Experts,
>>
>> I have a 5 node cluster with 8 core CPU and 32 GiB RAM
>>
>> If I have a write TPS of 5K/s and read TPS of 8K/s, I want to know what
>> is the optimal heap size configuration for each cassandra node.
>>
>> Currently, the heap size is set at 8GB. How can I know if cassandra
>> requires more or less heap memory?
>>
>> Akshay Bhardwaj
>> +91-97111-33849
>>
>

Re: Optimal Heap Size Cassandra Configuration

Reply via email to