Hello,

I'm looking at nodetool repair with the "-pr", vs. non "-pr" option.  Looking 
around, I'm seeing a lot of conflicting information out there.  Almost 
universally, the recommendation is to run nodetool repair with the "-pr" for 
any day-to-day maintenance.

This is my understanding of how it works.  I appreciate any corrections to my 
misinformation.

nodetool repair -pr

- This performs a repair on the "primary range" of the node.  The primary range 
is essentially the part of the ring that the node is responsible for.  When 
this command is run, synchronization of replicas will occur for the rows that 
this node is responsible for.  If replicas are missing from that node's 
neighbors for those rows, they will be replicated.

nodetool repair

- This is where I see a lot of conflicting information.  I see a lot of answers 
in which there is a suggestion that this command will perform a repair across 
the entire cluster.  However, I don't believe this is true from my observations 
(and some of the items I read seems to agree with this).  Instead, this command 
performs synchronization of your primary range, but also for other ranges that 
this node maybe responsible for in a replica capacity.  The way I'm thinking 
about it is that the -pr option causes repairs to push information from its 
primary range to replicas.  Without -pr, nodetool replair does a push, and pull 
for its neighbors that this node maybe a replica for.  This makes sense to me, 
as people recommend running nodetool repair after a node has been down.  This 
is to allow the downed node to get any missed information that should have been 
replicated to it while it was down. 

I'm sure there lots of flaws to the above understanding as I'm cobbling it 
together.  I appreciate the feedback,

-Mike

Reply via email to