Re: nodetool repair -pr enough in this scenario?

aaron morton Tue, 05 Jun 2012 11:45:44 -0700

-pr is a new feature added in 1.0. It was added for efficiency, not 
functionality. With -pr repair does 1/RF the work it does without it.


> Am I understood correctly, that “repair” with or without –PR is not a “repair 
> selected node” process, but “synchronize data range(s) between replicas” 
> process?
Yes. 
But if you have a node that has been down for a few hours you may want to get 
it's primary range repaired quickly. 

Or as sylvain says, if you are running repair on every node in the cluster you 
can use -pr to reduce the duration of the repair operation.  It would have the 
same effect as running repair without -pr on every RF'th node in the cluster. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/06/2012, at 9:19 PM, Viktor Jevdokimov wrote:

> But in any case, repair is a two way process?
> I mean that repair without –PR on node N1 will repair N1 and N2 and N3, 
> because N2 is a replica of N1 range and N1 is a replica of N3 range?
> And if there’re more ranges, that not belongs to N1, that ranges and nodes 
> will not be repaired?
>  
>  
> Am I understood correctly, that “repair” with or without –PR is not a “repair 
> selected node” process, but “synchronize data range(s) between replicas” 
> process?
> Single DC scenario:
> With –PR: synchronize data for only primary data range of selected node 
> between all nodes for that range (max number of nodes for the range = RF).
> Without –PR: synchronize data for all data ranges of selected node (primary 
> and replica) between all nodes of that ranges (max number of nodes for the 
> ranges = RF*RF). Not effective since ranges overlaps, the same ranges will be 
> synchronized more than once (max = RF times).
> Multiple DC with 100% data range in each DC scenario: the same, only RF = sum 
> of RF from all DC’s.
> Is that correct?
>  
> Finally – is this process for SSTables only, excluding memtables and hints?
>  
>  
>  
> 
> 
> Best regards / Pagarbiai
> Viktor Jevdokimov
> Senior Developer
> 
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider
> What is Adform: watch this short video
> <signature-logo29.png>
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
> 
> From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
> Sent: Tuesday, June 05, 2012 11:02
> To: user@cassandra.apache.org
> Subject: Re: nodetool repair -pr enough in this scenario?
>  
> On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov 
> <viktor.jevdoki...@adform.com> wrote:
> Understand simple mechanics first, decide how to act later.
>  
> Without –PR there’s no difference from which host to run repair, it runs for 
> the whole 100% range, from start to end, the whole cluster, all nodes, at 
> once.
>  
> That's not exactly true. A repair without -pr will repair all the ranges of 
> the node on which repair is ran. So it will only repair the ranges that the 
> node is a replica for. It will *not* repair the whole cluster (unless the 
> replication factor is equal to the number of nodes in the cluster but that's 
> a degenerate case). And hence it does matter on which host repair is run (it 
> always matter, whether you use -pr or not).
>  
> In general you want to use repair without -pr in case where you want to 
> repair a specific node. Typically, if a node was dead for a reasonably long 
> time, you may want to run a repair (without -pr) on that specific node to 
> have him catch up faster (faster that if you were only relying on read-repair 
> and hinted-handoff).
>  
> For repairing a whole cluster, as is the case for the weekly scheduled 
> repairs in the initial question, you want to use -rp. You *do not* want to 
> use repair without -pr in that case. You do not because for that task using 
> -pr is more efficient (and to be clear, not using -pr won't cause problems, 
> but it does is less efficient).
>  
> --
> Sylvain
>  
>  
>  
> With –PR it runs only for a primary range of a node you are running a repair.
> Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) 
> N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC 
> aware.
> So running repair with –PR on node N2 will only repair a range A-B, for which 
> node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range 
> one with other. For other ranges you need to run on other nodes.
>  
> Without –PR running on any node will repair all ranges, A-B, B-C, C-A. A node 
> you run a repair without –PR is just a repair coordinator, so no difference, 
> which one will be next time.
>  
>  
>  
> 
> Best regards / Pagarbiai
> Viktor Jevdokimov
> Senior Developer
>  
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider
> What is Adform: watch this short video
> <image001.png>
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
>  
> From: David Daeschler [mailto:david.daesch...@gmail.com] 
> Sent: Tuesday, June 05, 2012 08:59
> To: user@cassandra.apache.org
> Subject: nodetool repair -pr enough in this scenario?
>  
> Hello,
>  
> Currently I have a 4 node cassandra cluster on CentOS64. I have been running 
> nodetool repair (no -pr option) on a weekly schedule like:
>  
> Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri
>  
> In this scenario, if I were to add the -pr option, would this still be 
> sufficient to prevent forgotten deletes and properly maintain consistency?
>  
> Thank you,
> - David

Re: nodetool repair -pr enough in this scenario?

Reply via email to