I also noticed that if I use the -pr option, the repair process went down from 30 hours to 9 hours. Is the -pr option safe to use if I want to run repair processes in parallel on nodes that are not replication peers?
thanks On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng <berryt...@gmail.com> wrote: > Thank you for confirming that the per node data size is most likely > causing the long repair process. I have tried a repair on smaller column > families and it was significantly faster. > > On Wed, Apr 11, 2012 at 9:55 PM, aaron morton <aa...@thelastpickle.com>wrote: > >> If you have 1TB of data it will take a long time to repair. Every bit of >> data has to be read and a hash generated. This is one of the reasons we >> often suggest that around 300 to 400Gb per node is a good load in the >> general case. >> >> Look at nodetool compactionstats .Is there a validation compaction >> running ? If so it is still building the merkle hash tree. >> >> Look at nodetool netstats . Is it streaming data ? If so all hash trees >> have been calculated. >> >> Cheers >> >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 12/04/2012, at 2:16 AM, Frank Ng wrote: >> >> Can you expand further on your issue? Were you using Random Patitioner? >> >> thanks >> >> On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach <leim...@gmail.com>wrote: >> >>> I had this happen when I had really poorly generated tokens for the >>> ring. Cassandra seems to accept numbers that are too big. You get hot >>> spots when you think you should be balanced and repair never ends (I think >>> there is a 48 hour timeout). >>> >>> >>> On Tuesday, April 10, 2012, Frank Ng wrote: >>> >>>> I am not using tier-sized compaction. >>>> >>>> >>>> On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone <rh...@tinyco.com>wrote: >>>> >>>>> Data size, number of nodes, RF? >>>>> >>>>> Are you using size-tiered compaction on any of the column families >>>>> that hold a lot of your data? >>>>> >>>>> Do your cassandra logs say you are streaming a lot of ranges? >>>>> zgrep -E "(Performing streaming repair|out of sync)" >>>>> >>>>> >>>>> On Tue, Apr 10, 2012 at 9:45 AM, Igor <i...@4friends.od.ua> wrote: >>>>> >>>>>> On 04/10/2012 07:16 PM, Frank Ng wrote: >>>>>> >>>>>> Short answer - yes. >>>>>> But you are asking wrong question. >>>>>> >>>>>> >>>>>> I think both processes are taking a while. When it starts up, >>>>>> netstats and compactionstats show nothing. Anyone out there successfully >>>>>> using ext3 and their repair processes are faster than this? >>>>>> >>>>>> On Tue, Apr 10, 2012 at 10:42 AM, Igor <i...@4friends.od.ua> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> You can check with nodetool which part of repair process is slow - >>>>>>> network streams or verify compactions. use nodetool netstats or >>>>>>> compactionstats. >>>>>>> >>>>>>> >>>>>>> On 04/10/2012 05:16 PM, Frank Ng wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I am on Cassandra 1.0.7. My repair processes are taking over 30 >>>>>>>> hours to complete. Is it normal for the repair process to take this >>>>>>>> long? >>>>>>>> I wonder if it's because I am using the ext3 file system. >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jonathan Rhone >>>>> Software Engineer >>>>> >>>>> *TinyCo* >>>>> 800 Market St., Fl 6 >>>>> San Francisco, CA 94102 >>>>> www.tinyco.com >>>>> >>>>> >>>> >> >> >