> - how do i stop repair before i run out of storage? ( can't let this finish )
To stop the validation part of the repair… nodetool -h localhost stop VALIDATION The only way I know to stop streaming is restart the node, their may be a better way though. > INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 AntiEntropyService.java > (line 666) [repair #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will > sync /X.X.1.113, /X.X.0.71 on range > (85070591730234615865843651857942052964,0] for ( .. ) Am assuming this was ran on the first node in DC west with -pr as you said. The log message is saying this is going to repair the primary range for the node for the node. The repair is then actually performed one CF at a time. You should also see log messages ending with "range(s) out of sync" which will say how out of sync the data is. > - how do i clean up my stables ( grew from 6k to 20k since this started, > while i shut writes off completely ) Sounds like repair is streaming a lot of differences. If you have the space I would give Levelled compaction time to take care of it. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/12/2012, at 1:32 AM, Andras Szerdahelyi <andras.szerdahe...@ignitionone.com> wrote: > hi list, > > AntiEntropyService started syncing ranges of entire nodes ( ?! ) across my > data centers and i'd like to understand why. > > I see log lines like this on all my nodes in my two ( east/west ) data > centres... > > INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 AntiEntropyService.java > (line 666) [repair #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will > sync /X.X.1.113, /X.X.0.71 on range > (85070591730234615865843651857942052964,0] for ( .. ) > > ( this is around 80-100 GB of data for a single node. ) > > - i did not observe any network failures or nodes falling off the ring > - good distribution of data ( load is equal on all nodes ) > - hinted handoff is on > - read repair chance is 0.1 on the CF > - 2 replicas in each data centre ( which is also the number of nodes in each > ) with NetworkTopologyStrategy > - repair -pr is scheduled to run off-peak hours, daily > - leveled compaction with stable max size 256mb ( i have found this to > trigger compaction in acceptable intervals while still keeping the stable > count down ) > - i am on 1.1.6 > - java heap 10G > - max memtables 2G > - 1G row cache > - 256M key cache > > my nodes' ranges are: > > DC west > 0 > 85070591730234615865843651857942052864 > > DC east > 100 > 85070591730234615865843651857942052964 > > symptoms are: > - logs show sstables being streamed over to other nodes > - 140k files in data dir of CF on all nodes > - cfstats reports 20k sstables, up from 6 on all nodes > - compaction continuously running with no results whatsoever ( number of > stables growing ) > > i tried the following: > - offline scrub ( has gone OOM, i noticed the script in the debian package > specifies 256MB heap? ) > - online scrub ( no effect ) > - repair ( no effect ) > - cleanup ( no effect ) > > my questions are: > - how do i stop repair before i run out of storage? ( can't let this finish ) > - how do i clean up my stables ( grew from 6k to 20k since this started, > while i shut writes off completely ) > > thanks, > Andras > > Andras Szerdahelyi > Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A > M: +32 493 05 50 88 | Skype: sandrew84 > > > <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png> > >