Sweeet, I %100 understand this now from these last few emails.  It has always 
been a bit confusing.

Thanks,
Dean

From: Sylvain Lebresne <sylv...@datastax.com<mailto:sylv...@datastax.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, March 1, 2013 4:36 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: -pr vs. no -pr

On Thu, Feb 28, 2013 at 11:39 PM, Hiller, Dean 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote:
Isn't it true if I have 6 nodes, I could run nodetool repair on just 2 
nodes(RF=3) instead of using nodetool repair –pr???

Yes, it is true.

And to precise further, in your case you have 2 options:
 1) doing repair *without* -pr on 2 nodes (assuming you pick the correct 2 
nodes, it's *not* any 2 nodes)
 2) doing a repair *with* -pr on the 6 nodes

Both of those cases would 1) repair the full ring and 2) do the same amount of 
work.

What is the advantage of –pr then?

As it happens, your case is a special case. You have a number of node that is a 
multiple of your replication factor. Now if that wasn't the case (say 5, 7 or 8 
nodes with RF=3), then there is *no way* you can repair *without* -pr the whole 
cluster without doing *more* work than by doing a repair *with* -pr on all 
nodes.

So the advantages of --pr (which btw, should be use for repair the whole 
cluster, not when you want to rebuild a specific node) are:
 1) it always do the minimum of work, while repair without --pr is wasteful if 
the number of nodes is not a multiple of the replication factor (no matter how 
smart you are at scheduling the repairs).
 2) even if your number of nodes is a multiple of the replication factor, you 
still have to make sure you pick the right N/RF nodes to repair if you don't 
use -pr. If you don't pick the correct ones, you will not repair the full ring. 
Using -pr is much more shoot-footing free: you have to run it on every node, 
period.

--
Sylvain

Reply via email to