Re: concurrent_compactors via JMX

Alain RODRIGUEZ Tue, 17 Jul 2018 12:19:31 -0700

Hello Riccardo,

I noticed I have been writing a novel to answer a simple couple of
questions again ¯\_(ツ)_/¯. So here is a short answer in the case that's
what you were looking for :). Also, there is a warning that it might be
counter-productive and stress the cluster even more to increase the
compaction throughput. There is more information below ('about the issue').


*tl;dr*:

What about using 'nodetool setcompactionthroughput XX' instead. It should
available there.

In the same way 'nodetool getcompactionthroughput' gives you the current
value. Be aware that this change done through JMX/nodetool is *not* permanent.
You still need to update the cassandra.yaml file.

If you really want to use the MBean through JMX, because using 'nodetool'
is too easy (or for any other reason :p):

Mbean: org.apache.cassandra.service.StorageServiceMBean
Attribute: CompactionThroughputMbPerSec

*Long story* with the "how to" since I went through this search myself, I
did not know where this MBean was.

Can someone point me to the right mbean?
> I can not really find good docs about mbeans (or tools ...)


I am not sure about the doc, but you can use jmxterm (
http://wiki.cyclopsgroup.org/jmxterm/download.html).

To replace the doc I use CCM (https://github.com/riptano/ccm) + jconsole to
find the mbeans locally:

* Add loopback addresses for ccm (see the readme file)
* then, create the cluster: * 'ccm create Cassandra-3-0-6 -v 3.0.6 -n 3 -s'
* Start jconsole using the right pid: 'jconsole $(ccm node1 show | grep pid
| cut -d "=" -f 2)'
* Explore MBeans, try to guess where this could be (and discover other
funny stuff in there :)).

I must admit I did not find it this way using C*3.0.6 and jconsole.
I looked at the code, I locally used C*3.0.6 and ran 'grep -RiI
CompactionThroughput' with this result:
https://gist.github.com/arodrime/f9591e4bdd2b1367a496447cdd959006

With this I could find the right MBean, the only code documentation that is
always up to date is the code itself I am afraid:

'./src/java/org/apache/cassandra/service/StorageServiceMBean.java:
public void setCompactionThroughputMbPerSec(int value);'

Note that the research in the code also leads to nodetool ;-).

I could finally find the MBean in the 'jconsole' too:
https://cdn.pbrd.co/images/HuUya3x.png (not sure how long this link will
live).

jconsole also allows you to see what attributes it is possible to set or
not.

You can now find any other MBean you would need I hope :).


see if it helps when the system is under stress


*About the issue*

You don't exactly say what you are observing, what is that "stress"? How is
it impacting the cluster?

I ask because I am afraid this change might not help and even be
counter-productive. Even though having SSTables nicely compacted make a
huge difference at the read time, if that's already the case for you and
the data is already nicely compacted, doing this change won't help. It
might even make things slightly worse if the current bottleneck is the disk
IO during a stress period as the compactors would increase their disk read
throughput, thus maybe fight with the read requests for disk throughput.

If you have a similar number of sstables on all nodes, not many compactions
pending (nodetool netstats -H) and read operations are hitting a small
number sstables (nodetool tablehistogram) then you probably don't need to
increase the compaction speed.

Let's say that the compaction throughput is not often the cause of stress
during peak hours nor a direct way to make things 'faster'. Generally when
compaction goes wrong, the number of sstables goes *t**hrou**g**h* the
roof. If you have a chart showing the number sstables, you can see this
really well.

Of course, if you feel you are in this case, increasing the compaction
throughput will definitely help if the cluster also has spared disk
throughput.

To check what's wrong, if you believe it's something different, here are
some useful commands:

- nodetool tpstats (check for pending/blocked/dropped threads there)
- check WARN and ERRORS in the logs (ie. grep -e "WARN" -e "ERROR"
/var/log/cassandra/system.log)
- Check local latencies (nodetool tablestats / nodetool tablehistogram) and
compare it to the client request latency. At the node level, reads should
probably be a single digit in milliseconds, rather close to 1 ms with SSDs
and writes below the millisecond most probably (it depends on the data size
too, etc...).
- Trace a query during this period, see what takes time (for example from
'cqlsh' - 'TRACING ON; SELECT ...')

You can also analyze the *Garbage Collection* activity. As Cassandre uses
the JVM, a badly tuned GC will induce long pauses. Depending on the
workload, and I must say for most of the cluster I work on, default the
tuning is not that good and can keep server busy 10-15% of the time with
stop the world GC.
You might find this post of my colleague Jon about GC tuning for Apache
Cassandra interesting:
http://thelastpickle.com/blog/2018/04/11/gc-tuning.html. GC pressure is a
very common way to optimize a Cassandra cluster, to adapt it to your
workload/hardware.

C*heers,
-----------------------
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2018-07-17 17:23 GMT+01:00 Riccardo Ferrari <ferra...@gmail.com>:

> Hi list,
>
> Cassandra 3.0.6
>
> I'd like to test the change of concurrent compactors to see if it helps
> when the system is under stress.
>
> Can someone point me to the right mbean?
> I can not really find good docs about mbeans (or tools ...)
>
> Any suggestion much appreciated, best
>

Re: concurrent_compactors via JMX

Reply via email to