The log message matches what I would expect to see for nodetool -pr Not using pr means repair all the ranges the node is a replica for. If you have RF == number of nodes, then it will repair all the data.
Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/12/2012, at 9:42 PM, Andras Szerdahelyi <andras.szerdahe...@ignitionone.com> wrote: > Thanks! > > i'm also thinking a repair run without -pr could have caused this maybe ? > > > Andras Szerdahelyi > Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A > M: +32 493 05 50 88 | Skype: sandrew84 > > > <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png> > > > On 06 Dec 2012, at 04:05, aaron morton <aa...@thelastpickle.com> wrote: > >>> - how do i stop repair before i run out of storage? ( can't let this finish >>> ) >> >> To stop the validation part of the repair… >> >> nodetool -h localhost stop VALIDATION >> >> >> The only way I know to stop streaming is restart the node, their may be a >> better way though. >> >> >>> INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 >>> AntiEntropyService.java (line 666) [repair >>> #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync /X.X.1.113, >>> /X.X.0.71 on range (85070591730234615865843651857942052964,0] for ( .. ) >> Am assuming this was ran on the first node in DC west with -pr as you said. >> The log message is saying this is going to repair the primary range for the >> node for the node. The repair is then actually performed one CF at a time. >> >> You should also see log messages ending with "range(s) out of sync" which >> will say how out of sync the data is. >> >>> - how do i clean up my stables ( grew from 6k to 20k since this started, >>> while i shut writes off completely ) >> Sounds like repair is streaming a lot of differences. >> If you have the space I would give Levelled compaction time to take care of >> it. >> >> Hope that helps. >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 6/12/2012, at 1:32 AM, Andras Szerdahelyi >> <andras.szerdahe...@ignitionone.com> wrote: >> >>> hi list, >>> >>> AntiEntropyService started syncing ranges of entire nodes ( ?! ) across my >>> data centers and i'd like to understand why. >>> >>> I see log lines like this on all my nodes in my two ( east/west ) data >>> centres... >>> >>> INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 >>> AntiEntropyService.java (line 666) [repair >>> #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync /X.X.1.113, >>> /X.X.0.71 on range (85070591730234615865843651857942052964,0] for ( .. ) >>> >>> ( this is around 80-100 GB of data for a single node. ) >>> >>> - i did not observe any network failures or nodes falling off the ring >>> - good distribution of data ( load is equal on all nodes ) >>> - hinted handoff is on >>> - read repair chance is 0.1 on the CF >>> - 2 replicas in each data centre ( which is also the number of nodes in >>> each ) with NetworkTopologyStrategy >>> - repair -pr is scheduled to run off-peak hours, daily >>> - leveled compaction with stable max size 256mb ( i have found this to >>> trigger compaction in acceptable intervals while still keeping the stable >>> count down ) >>> - i am on 1.1.6 >>> - java heap 10G >>> - max memtables 2G >>> - 1G row cache >>> - 256M key cache >>> >>> my nodes' ranges are: >>> >>> DC west >>> 0 >>> 85070591730234615865843651857942052864 >>> >>> DC east >>> 100 >>> 85070591730234615865843651857942052964 >>> >>> symptoms are: >>> - logs show sstables being streamed over to other nodes >>> - 140k files in data dir of CF on all nodes >>> - cfstats reports 20k sstables, up from 6 on all nodes >>> - compaction continuously running with no results whatsoever ( number of >>> stables growing ) >>> >>> i tried the following: >>> - offline scrub ( has gone OOM, i noticed the script in the debian package >>> specifies 256MB heap? ) >>> - online scrub ( no effect ) >>> - repair ( no effect ) >>> - cleanup ( no effect ) >>> >>> my questions are: >>> - how do i stop repair before i run out of storage? ( can't let this finish >>> ) >>> - how do i clean up my stables ( grew from 6k to 20k since this started, >>> while i shut writes off completely ) >>> >>> thanks, >>> Andras >>> >>> Andras Szerdahelyi >>> Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A >>> M: +32 493 05 50 88 | Skype: sandrew84 >>> >>> >>> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png> >>> >>> >> >