Re: very slow repair

Oleksandr Shulgin Thu, 13 Jun 2019 03:06:25 -0700

On Thu, Jun 13, 2019 at 10:36 AM R. T. <rastr...@protonmail.com.invalid>
wrote:


>
> Well, actually by running cfstats I can see that the totaldiskspaceused is
> about ~ 1.2 TB per node in the DC1 and ~ 1 TB per node in DC2. DC2 was off
> for a while thats why there is a difference in space.
>
> I am using Cassandra 3.0.6 and
> my stream_throughput_outbound_megabits_per_sec is th4e default setting so
> according to my version is (200 Mbps or 25 MB/s)
>

And the other setting: compaction_throughput_mb_per_sec?  It is also highly
relevant for repair performance, as streamed in files need to be compacted
with the existing files on the nodes.  In our experience change in
compaction throughput limit is almost linearly reflected by the repair run
time.

The default 16 MB/s is too limiting for any production grade setup, I
believe.  We go as high as 90 MB/s on AWS EBS gp2 data volumes.  But don't
take it as a gospel, I'd suggest you start increasing the setting (e.g. by
doubling it) and observe how it affects repair performance (and client
latencies).

Have you tried with "parallel" instead of "DC parallel" mode?  The latter
one is really poorly named and it actually means something else, as neatly
highlighted in this SO answer: https://dba.stackexchange.com/a/175028

Last, but not least: are you using the default number of vnodes, 256?  The
overhead of large number of vnodes (times the number of nodes), can be
quite significant.  We've seen major improvements in repair runtime after
switching from 256 to 16 vnodes on Cassandra version 3.0.

Cheers,
--
Alex

Re: very slow repair

Reply via email to