Thank you for confirming that the per node data size is most likely causing the long repair process. I have tried a repair on smaller column families and it was significantly faster.
On Wed, Apr 11, 2012 at 9:55 PM, aaron morton <aa...@thelastpickle.com>wrote: > If you have 1TB of data it will take a long time to repair. Every bit of > data has to be read and a hash generated. This is one of the reasons we > often suggest that around 300 to 400Gb per node is a good load in the > general case. > > Look at nodetool compactionstats .Is there a validation compaction running > ? If so it is still building the merkle hash tree. > > Look at nodetool netstats . Is it streaming data ? If so all hash trees > have been calculated. > > Cheers > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 12/04/2012, at 2:16 AM, Frank Ng wrote: > > Can you expand further on your issue? Were you using Random Patitioner? > > thanks > > On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach <leim...@gmail.com> wrote: > >> I had this happen when I had really poorly generated tokens for the ring. >> Cassandra seems to accept numbers that are too big. You get hot spots >> when you think you should be balanced and repair never ends (I think there >> is a 48 hour timeout). >> >> >> On Tuesday, April 10, 2012, Frank Ng wrote: >> >>> I am not using tier-sized compaction. >>> >>> >>> On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone <rh...@tinyco.com>wrote: >>> >>>> Data size, number of nodes, RF? >>>> >>>> Are you using size-tiered compaction on any of the column families that >>>> hold a lot of your data? >>>> >>>> Do your cassandra logs say you are streaming a lot of ranges? >>>> zgrep -E "(Performing streaming repair|out of sync)" >>>> >>>> >>>> On Tue, Apr 10, 2012 at 9:45 AM, Igor <i...@4friends.od.ua> wrote: >>>> >>>>> On 04/10/2012 07:16 PM, Frank Ng wrote: >>>>> >>>>> Short answer - yes. >>>>> But you are asking wrong question. >>>>> >>>>> >>>>> I think both processes are taking a while. When it starts up, >>>>> netstats and compactionstats show nothing. Anyone out there successfully >>>>> using ext3 and their repair processes are faster than this? >>>>> >>>>> On Tue, Apr 10, 2012 at 10:42 AM, Igor <i...@4friends.od.ua> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> You can check with nodetool which part of repair process is slow - >>>>>> network streams or verify compactions. use nodetool netstats or >>>>>> compactionstats. >>>>>> >>>>>> >>>>>> On 04/10/2012 05:16 PM, Frank Ng wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am on Cassandra 1.0.7. My repair processes are taking over 30 >>>>>>> hours to complete. Is it normal for the repair process to take this >>>>>>> long? >>>>>>> I wonder if it's because I am using the ext3 file system. >>>>>>> >>>>>>> thanks >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Jonathan Rhone >>>> Software Engineer >>>> >>>> *TinyCo* >>>> 800 Market St., Fl 6 >>>> San Francisco, CA 94102 >>>> www.tinyco.com >>>> >>>> >>> > >