-pr is a new feature added in 1.0. It was added for efficiency, not functionality. With -pr repair does 1/RF the work it does without it.
> Am I understood correctly, that “repair” with or without –PR is not a “repair > selected node” process, but “synchronize data range(s) between replicas” > process? Yes. But if you have a node that has been down for a few hours you may want to get it's primary range repaired quickly. Or as sylvain says, if you are running repair on every node in the cluster you can use -pr to reduce the duration of the repair operation. It would have the same effect as running repair without -pr on every RF'th node in the cluster. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/06/2012, at 9:19 PM, Viktor Jevdokimov wrote: > But in any case, repair is a two way process? > I mean that repair without –PR on node N1 will repair N1 and N2 and N3, > because N2 is a replica of N1 range and N1 is a replica of N3 range? > And if there’re more ranges, that not belongs to N1, that ranges and nodes > will not be repaired? > > > Am I understood correctly, that “repair” with or without –PR is not a “repair > selected node” process, but “synchronize data range(s) between replicas” > process? > Single DC scenario: > With –PR: synchronize data for only primary data range of selected node > between all nodes for that range (max number of nodes for the range = RF). > Without –PR: synchronize data for all data ranges of selected node (primary > and replica) between all nodes of that ranges (max number of nodes for the > ranges = RF*RF). Not effective since ranges overlaps, the same ranges will be > synchronized more than once (max = RF times). > Multiple DC with 100% data range in each DC scenario: the same, only RF = sum > of RF from all DC’s. > Is that correct? > > Finally – is this process for SSTables only, excluding memtables and hints? > > > > > > Best regards / Pagarbiai > Viktor Jevdokimov > Senior Developer > > Email: viktor.jevdoki...@adform.com > Phone: +370 5 212 3063, Fax +370 5 261 0453 > J. Jasinskio 16C, LT-01112 Vilnius, Lithuania > Follow us on Twitter: @adforminsider > What is Adform: watch this short video > <signature-logo29.png> > > Disclaimer: The information contained in this message and attachments is > intended solely for the attention and use of the named addressee and may be > confidential. If you are not the intended recipient, you are reminded that > the information remains the property of the sender. You must not use, > disclose, distribute, copy, print or rely on this e-mail. If you have > received this message in error, please contact the sender immediately and > irrevocably delete this message and any copies. > > From: Sylvain Lebresne [mailto:sylv...@datastax.com] > Sent: Tuesday, June 05, 2012 11:02 > To: user@cassandra.apache.org > Subject: Re: nodetool repair -pr enough in this scenario? > > On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov > <viktor.jevdoki...@adform.com> wrote: > Understand simple mechanics first, decide how to act later. > > Without –PR there’s no difference from which host to run repair, it runs for > the whole 100% range, from start to end, the whole cluster, all nodes, at > once. > > That's not exactly true. A repair without -pr will repair all the ranges of > the node on which repair is ran. So it will only repair the ranges that the > node is a replica for. It will *not* repair the whole cluster (unless the > replication factor is equal to the number of nodes in the cluster but that's > a degenerate case). And hence it does matter on which host repair is run (it > always matter, whether you use -pr or not). > > In general you want to use repair without -pr in case where you want to > repair a specific node. Typically, if a node was dead for a reasonably long > time, you may want to run a repair (without -pr) on that specific node to > have him catch up faster (faster that if you were only relying on read-repair > and hinted-handoff). > > For repairing a whole cluster, as is the case for the weekly scheduled > repairs in the initial question, you want to use -rp. You *do not* want to > use repair without -pr in that case. You do not because for that task using > -pr is more efficient (and to be clear, not using -pr won't cause problems, > but it does is less efficient). > > -- > Sylvain > > > > With –PR it runs only for a primary range of a node you are running a repair. > Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) > N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC > aware. > So running repair with –PR on node N2 will only repair a range A-B, for which > node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range > one with other. For other ranges you need to run on other nodes. > > Without –PR running on any node will repair all ranges, A-B, B-C, C-A. A node > you run a repair without –PR is just a repair coordinator, so no difference, > which one will be next time. > > > > > Best regards / Pagarbiai > Viktor Jevdokimov > Senior Developer > > Email: viktor.jevdoki...@adform.com > Phone: +370 5 212 3063, Fax +370 5 261 0453 > J. Jasinskio 16C, LT-01112 Vilnius, Lithuania > Follow us on Twitter: @adforminsider > What is Adform: watch this short video > <image001.png> > > Disclaimer: The information contained in this message and attachments is > intended solely for the attention and use of the named addressee and may be > confidential. If you are not the intended recipient, you are reminded that > the information remains the property of the sender. You must not use, > disclose, distribute, copy, print or rely on this e-mail. If you have > received this message in error, please contact the sender immediately and > irrevocably delete this message and any copies. > > From: David Daeschler [mailto:david.daesch...@gmail.com] > Sent: Tuesday, June 05, 2012 08:59 > To: user@cassandra.apache.org > Subject: nodetool repair -pr enough in this scenario? > > Hello, > > Currently I have a 4 node cassandra cluster on CentOS64. I have been running > nodetool repair (no -pr option) on a weekly schedule like: > > Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri > > In this scenario, if I were to add the -pr option, would this still be > sufficient to prevent forgotten deletes and properly maintain consistency? > > Thank you, > - David