Re: Odd CPU utilization spikes on 1 node out of 30 during repair

Oleksandr Shulgin Wed, 26 Sep 2018 04:32:41 -0700

On Wed, Sep 26, 2018 at 1:07 PM Anup Shirolkar <
anup.shirol...@instaclustr.com> wrote:


>
> Looking at information you have provided, the increased CPU utilisation
> could be because of repair running on the node.
> Repairs are resource intensive operations.
>
> Restarting the node should have halted repair operation getting the CPU
> back to normal.
>

The repair was running on all nodes at the same time, still only one node
had CPU significantly different from the rest of the nodes.
As I've mentioned: we are running non-incremental parallel repair using
Cassandra Reaper.
After the node was restarted, new repair tasks were given to it by Reaper
and it was doing repair as previously, but this time
without exposing the odd behavior.

In some cases, repairs trigger additional operations e.g. compactions,
> anti-compactions
> These operations could cause extra CPU utilisation.
> What is the compaction strategy used on majority of keyspaces ?
>

For the 2 tables involved in this regular repair we are using
TimeWindowCompactionStrategy with time windows of 30 days.

Talking about CPU utilisation *percentage*, although it has doubled but the
> increase is 15%.
> It would be interesting to know the number of CPU cores on these nodes to
> judge the absolute increase in CPU utilisation.
>

All nodes are using the same hardware on AWS EC2: r4.xlarge, they have 4
vCPUs.

You should try to find the root cause behind the behaviour and decide
> course of action.
>

Sure, that's why I was asking for ideas how to find the root cause. :-)

Effective use monitoring, logs can help you identify the root cause.
>

As I've mentioned, we do have monitoring and I've checked the logs, but
that didn't help to identify the issue so far.

Regards,
--
Alex

Re: Odd CPU utilization spikes on 1 node out of 30 during repair

Reply via email to