i am on DSE, and i am referring to the json manifest ... but my memory isn't very good so i could have the name wrong. we are hitting this bug: https://issues.apache.org/jira/browse/CASSANDRA-3306
On Wed, Dec 19, 2012 at 8:17 AM, Andras Szerdahelyi < andras.szerdahe...@ignitionone.com> wrote: > Solr? Are you on DSE or am i missing something ( huge ) about Cassandra? > ( wouldnt be the first time :-) > > Or do you mean the json manifest ? Its there and it looks ok, in fact > its been corrupted twice due to storage problems and i hit > https://issues.apache.org/jira/browse/CASSANDRA-5041 > TBH i think this was a repair without -pr > > thanks, > Andras > > Andras Szerdahelyi* > *Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A > M: +32 493 05 50 88 | Skype: sandrew84 > > > > > > On 18 Dec 2012, at 22:09, B. Todd Burruss <bto...@gmail.com> wrote: > > in your data directory, for each keyspace there is a solr.json. > cassandra stores the SSTABLEs it knows about when using leveled > compaction. take a look at that file and see if it looks accurate. if > not, this is a bug with cassandra that we are checking into as well > > > On Thu, Dec 6, 2012 at 7:38 PM, aaron morton <aa...@thelastpickle.com>wrote: > >> The log message matches what I would expect to see for nodetool -pr >> >> Not using pr means repair all the ranges the node is a replica for. If >> you have RF == number of nodes, then it will repair all the data. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 6/12/2012, at 9:42 PM, Andras Szerdahelyi < >> andras.szerdahe...@ignitionone.com> wrote: >> >> Thanks! >> >> i'm also thinking a repair run without -pr could have caused this >> maybe ? >> >> >> Andras Szerdahelyi* >> *Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A >> M: +32 493 05 50 88 | Skype: sandrew84 >> >> >> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png> >> >> >> On 06 Dec 2012, at 04:05, aaron morton <aa...@thelastpickle.com> wrote: >> >> - how do i stop repair before i run out of storage? ( can't let this >> finish ) >> >> >> To stop the validation part of the repair… >> >> nodetool -h localhost stop VALIDATION >> >> >> The only way I know to stop streaming is restart the node, their may be >> a better way though. >> >> >> INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 >> AntiEntropyService.java (line 666) [repair >> #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync /X.X.1.113, >> /X.X.0.71 on range (*85070591730234615865843651857942052964,0*] for ( .. >> ) >> >> Am assuming this was ran on the first node in DC west with -pr as you >> said. >> The log message is saying this is going to repair the primary range for >> the node for the node. The repair is then actually performed one CF at a >> time. >> >> You should also see log messages ending with "range(s) out of sync" >> which will say how out of sync the data is. >> >> >> - how do i clean up my stables ( grew from 6k to 20k since this started, >> while i shut writes off completely ) >> >> Sounds like repair is streaming a lot of differences. >> If you have the space I would give Levelled compaction time to take >> care of it. >> >> Hope that helps. >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 6/12/2012, at 1:32 AM, Andras Szerdahelyi < >> andras.szerdahe...@ignitionone.com> wrote: >> >> hi list, >> >> AntiEntropyService started syncing ranges of entire nodes ( ?! ) across >> my data centers and i'd like to understand why. >> >> I see log lines like this on all my nodes in my two ( east/west ) data >> centres... >> >> INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 >> AntiEntropyService.java (line 666) [repair >> #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync /X.X.1.113, >> /X.X.0.71 on range (*85070591730234615865843651857942052964,0*] for ( .. >> ) >> >> ( this is around 80-100 GB of data for a single node. ) >> >> - i did not observe any network failures or nodes falling off the ring >> - good distribution of data ( load is equal on all nodes ) >> - hinted handoff is on >> - read repair chance is 0.1 on the CF >> - 2 replicas in each data centre ( which is also the number of nodes in >> each ) with NetworkTopologyStrategy >> - repair -pr is scheduled to run off-peak hours, daily >> - leveled compaction with stable max size 256mb ( i have found this to >> trigger compaction in acceptable intervals while still keeping the stable >> count down ) >> - i am on 1.1.6 >> - java heap 10G >> - max memtables 2G >> - 1G row cache >> - 256M key cache >> >> my nodes' ranges are: >> >> DC west >> 0 >> 85070591730234615865843651857942052864 >> >> DC east >> 100 >> 85070591730234615865843651857942052964 >> >> symptoms are: >> - logs show sstables being streamed over to other nodes >> - 140k files in data dir of CF on all nodes >> - cfstats reports 20k sstables, up from 6 on all nodes >> - compaction continuously running with no results whatsoever ( number of >> stables growing ) >> >> i tried the following: >> - offline scrub ( has gone OOM, i noticed the script in the debian >> package specifies 256MB heap? ) >> - online scrub ( no effect ) >> - repair ( no effect ) >> - cleanup ( no effect ) >> >> my questions are: >> - how do i stop repair before i run out of storage? ( can't let this >> finish ) >> - how do i clean up my stables ( grew from 6k to 20k since this started, >> while i shut writes off completely ) >> >> thanks, >> Andras >> >> Andras Szerdahelyi* >> *Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A >> M: +32 493 05 50 88 | Skype: sandrew84 >> >> >> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png> >> >> >> >> >> >> > >
<<C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>>