On Thu, Apr 12, 2012 at 4:06 PM, Frank Ng <buzzt...@gmail.com> wrote: > I also noticed that if I use the -pr option, the repair process went down > from 30 hours to 9 hours. Is the -pr option safe to use if I want to run > repair processes in parallel on nodes that are not replication peers?
There is pretty much two use case for repair: 1) to rebuild a node: if say a node has lost some data due to a hard drive corruption or the like and you want to to rebuild what's missing 2) the periodic repairs to avoid problem with deleted data coming back from the dead (basically: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair) In case 1) you want to run 'nodetool repair' (without -pr) against the node to rebuild. In case 2) (which I suspect is the case your talking now), you *want* to use 'nodetool repair -pr' on *every* node of the cluster. I.e. that's the most efficient way to do it. The only reason not to use -pr in this case would be that it's not available because you're using an old version of Cassandra. And yes, it's is safe to run with -pr in parallel on nodes that are not replication peers. -- Sylvain > > thanks > > > On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng <berryt...@gmail.com> wrote: >> >> Thank you for confirming that the per node data size is most likely >> causing the long repair process. I have tried a repair on smaller column >> families and it was significantly faster. >> >> On Wed, Apr 11, 2012 at 9:55 PM, aaron morton <aa...@thelastpickle.com> >> wrote: >>> >>> If you have 1TB of data it will take a long time to repair. Every bit of >>> data has to be read and a hash generated. This is one of the reasons we >>> often suggest that around 300 to 400Gb per node is a good load in the >>> general case. >>> >>> Look at nodetool compactionstats .Is there a validation compaction >>> running ? If so it is still building the merkle hash tree. >>> >>> Look at nodetool netstats . Is it streaming data ? If so all hash trees >>> have been calculated. >>> >>> Cheers >>> >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 12/04/2012, at 2:16 AM, Frank Ng wrote: >>> >>> Can you expand further on your issue? Were you using Random Patitioner? >>> >>> thanks >>> >>> On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach <leim...@gmail.com> >>> wrote: >>>> >>>> I had this happen when I had really poorly generated tokens for the >>>> ring. Cassandra seems to accept numbers that are too big. You get hot >>>> spots when you think you should be balanced and repair never ends (I think >>>> there is a 48 hour timeout). >>>> >>>> >>>> On Tuesday, April 10, 2012, Frank Ng wrote: >>>>> >>>>> I am not using tier-sized compaction. >>>>> >>>>> >>>>> On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone <rh...@tinyco.com> >>>>> wrote: >>>>>> >>>>>> Data size, number of nodes, RF? >>>>>> >>>>>> Are you using size-tiered compaction on any of the column families >>>>>> that hold a lot of your data? >>>>>> >>>>>> Do your cassandra logs say you are streaming a lot of ranges? >>>>>> zgrep -E "(Performing streaming repair|out of sync)" >>>>>> >>>>>> >>>>>> On Tue, Apr 10, 2012 at 9:45 AM, Igor <i...@4friends.od.ua> wrote: >>>>>>> >>>>>>> On 04/10/2012 07:16 PM, Frank Ng wrote: >>>>>>> >>>>>>> Short answer - yes. >>>>>>> But you are asking wrong question. >>>>>>> >>>>>>> >>>>>>> I think both processes are taking a while. When it starts up, >>>>>>> netstats and compactionstats show nothing. Anyone out there >>>>>>> successfully >>>>>>> using ext3 and their repair processes are faster than this? >>>>>>> >>>>>>> On Tue, Apr 10, 2012 at 10:42 AM, Igor <i...@4friends.od.ua> wrote: >>>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> You can check with nodetool which part of repair process is slow - >>>>>>>> network streams or verify compactions. use nodetool netstats or >>>>>>>> compactionstats. >>>>>>>> >>>>>>>> >>>>>>>> On 04/10/2012 05:16 PM, Frank Ng wrote: >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I am on Cassandra 1.0.7. My repair processes are taking over 30 >>>>>>>>> hours to complete. Is it normal for the repair process to take this >>>>>>>>> long? >>>>>>>>> I wonder if it's because I am using the ext3 file system. >>>>>>>>> >>>>>>>>> thanks >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jonathan Rhone >>>>>> Software Engineer >>>>>> >>>>>> TinyCo >>>>>> 800 Market St., Fl 6 >>>>>> San Francisco, CA 94102 >>>>>> www.tinyco.com >>>>>> >>>>> >>> >>> >> >