Re: repair takes 10x more time in one DC compared to the other

Sylvain Lebresne Wed, 25 Jun 2014 08:47:42 -0700

TL;DR, this is not unexpected and this is perfectly fine.

For every node, 'repair --local' will repair the "primary" (where primary
means "the first range on the ring picked by the consistent hashing for
this node given its token", nothing more) range of the node in the ring.
And that range will be repaired for all replica in all data-centers. When
you assign tokens to multiple DC, it's actually pretty common to offset the
tokens of one DC slightly compared to the other one. This will result in
the "primary" ranges being always small in one DC but not the other. But
please note that this perfectly ok, it does not imply any imbalance in
data-centers. It also don't really mean that the node of one DC actually do
a lot more work than the other ones: all nodes most likely contribute
roughly the same amount of work to the repair. It only mean that the nodes
of one DC "coordinate" more repair work that those of the other DC. Which
is not really a big deal since coordinating a repair is cheap.


--
Sylvain


On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes <
paulo.mo...@chaordicsystems.com> wrote:

> Hello,
>
> I'm running repair on a large CF with the "--local" flag in 2 different
> DCs. In one of the DCs the operation takes about 1 hour per node, while in
> the other it takes 10 hours per node.
>
> I would expect the times to differ, but not so much. The writes on that CF
> all come from the DC where it takes 10 hours per node, could this be the
> cause why it takes so long on this DC?
>
> Additional info: C* 1.2.16, both DCs have the same replication factor.
>
> Cheers,
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br <http://www.chaordic.com.br/>*
> +55 48 3232.3200
>

Re: repair takes 10x more time in one DC compared to the other

Reply via email to