Atul, our fork has been tested on 2.1 and 3.0.x clusters. I've just tested with a CCM 3.6 cluster and it worked with no issue.
With Reaper, if you set incremental to false, it'll perform a full subrange repair with no anticompaction. You'll see this message in the logs : INFO [AntiEntropyStage:1] 2016-09-29 16:11:34,950 ActiveRepairService.java:378 - Not a global repair, will not do anticompaction If you set incremental to true, it'll perform an incremental repair, one node at a time, with anticompaction (set Parallelism to Parallel exclusively with inc repair). Let me know how it goes. On Thu, Sep 29, 2016 at 3:06 PM Atul Saroha <atul.sar...@snapdeal.com> wrote: > Hi Alexander, > > There is compatibility issue raised with spotify/cassandra-reaper for > cassandra version 3.x. > Is it comaptible with 3.6 in fork thelastpickle/cassandra-reaper ? > > There are some suggestions mentioned by *brstgt* which we can try on our > side. > > On Thu, Sep 29, 2016 at 5:42 PM, Atul Saroha <atul.sar...@snapdeal.com> > wrote: > >> Thanks Alexander. >> >> Will look into all these. >> >> On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski < >> a...@thelastpickle.com> wrote: >> >>> Atul, >>> >>> since you're using 3.6, by default you're running incremental repair, >>> which doesn't like concurrency very much. >>> Validation errors are not occurring on a partition or partition range >>> base, but if you're trying to run both anticompaction and validation >>> compaction on the same SSTable. >>> >>> Like advised to Robert yesterday, and if you want to keep on running >>> incremental repair, I'd suggest the following : >>> >>> - run nodetool tpstats on all nodes in search for running/pending >>> repair sessions >>> - If you have some, and to be sure you will avoid conflicts, roll >>> restart your cluster (all nodes) >>> - Then, run "nodetool repair" on one node. >>> - When repair has finished on this node (track messages in the log >>> and nodetool tpstats), check if other nodes are running anticompactions >>> - If so, wait until they are over >>> - If not, move on to the other node >>> >>> You should be able to run concurrent incremental compactions on >>> different tables if you wish to speed up the complete repair of the >>> cluster, but do not try to repair the same table/full keyspace from two >>> nodes at the same time. >>> >>> If you do not want to keep on using incremental repair, and fallback to >>> classic full repair, I think the only way in 3.6 to avoid anticompaction >>> will be to use subrange repair (Paulo mentioned that in 3.x full repair >>> also triggers anticompaction). >>> >>> You have two options here : cassandra_range_repair ( >>> https://github.com/BrianGallew/cassandra_range_repair) and Spotify >>> Reaper (https://github.com/spotify/cassandra-reaper) >>> >>> cassandra_range_repair might scream about subrange + incremental not >>> being compatible (not sure here), but you can modify the repair_range() >>> method by adding a --full switch to the command line used to run repair. >>> >>> We have a fork of Reaper that handles both full subrange repair and >>> incremental repair here : >>> https://github.com/thelastpickle/cassandra-reaper >>> It comes with a tweaked version of the UI made by Stephan Podkowinski ( >>> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases >>> interactions to schedule, run and track repair - which adds fields to run >>> incremental repair (accessible via ...:8080/webui/ in your browser). >>> >>> Cheers, >>> >>> >>> >>> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <atul.sar...@snapdeal.com> >>> wrote: >>> >>>> Hi, >>>> >>>> We are not sure whether this issue is linked to that node or not. Our >>>> application does frequent delete and insert. >>>> >>>> May be our approach is not correct for nodetool repair. Yes, we >>>> generally fire repair on all boxes at same time. Till now, it was manual >>>> with default configuration ( command: "nodetool repair"). >>>> Yes, we saw validation error but that is linked to already running >>>> repair of same partition on other box for same partition range. We saw >>>> error validation failed with some ip as repair in already running for the >>>> same SSTable. >>>> Just few days back, we had 2 DCs with 3 nodes each and replication was >>>> also 3. It means all data on each node. >>>> >>>> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski < >>>> a...@thelastpickle.com> wrote: >>>> >>>>> Hi Atul, >>>>> >>>>> could you be more specific on how you are running repair ? What's the >>>>> precise command line for that, does it run on several nodes at the same >>>>> time, etc... >>>>> What is your gc_grace_seconds ? >>>>> Do you see errors in your logs that would be linked to repairs >>>>> (Validation failure or failure to create a merkle tree)? >>>>> >>>>> You seem to mention a single node that went down but say the whole >>>>> cluster seem to have zombie data. >>>>> What is the connection you see between the node that went down and the >>>>> fact that deleted data comes back to life ? >>>>> What is your strategy for cyclic maintenance repair (schedule, command >>>>> line or tool, etc...) ? >>>>> >>>>> Thanks, >>>>> >>>>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <atul.sar...@snapdeal.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> We have seen a weird behaviour in cassandra 3.6. >>>>>> Once our node was went down more than 10 hrs. After that, we had ran >>>>>> Nodetool repair multiple times. But tombstone are not getting sync >>>>>> properly >>>>>> over the cluster. On day- today basis, on expiry of every grace period, >>>>>> deleted records start surfacing again in cassandra. >>>>>> >>>>>> It seems Nodetool repair in not syncing tomebstone across cluster. >>>>>> FYI, we have 3 data centres now. >>>>>> >>>>>> Just want the help how to verify and debug this issue. Help will be >>>>>> appreciated. >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Atul Saroha >>>>>> >>>>>> *Lead Software Engineer | CAMS* >>>>>> >>>>>> M: +91 8447784271 >>>>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18, >>>>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India >>>>>> >>>>>> -- >>>>> ----------------- >>>>> Alexander Dejanovski >>>>> France >>>>> @alexanderdeja >>>>> >>>>> Consultant >>>>> Apache Cassandra Consulting >>>>> http://www.thelastpickle.com >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Atul Saroha >>>> >>>> *Lead Software Engineer | CAMS* >>>> >>>> M: +91 8447784271 >>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18, >>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India >>>> >>>> -- >>> ----------------- >>> Alexander Dejanovski >>> France >>> @alexanderdeja >>> >>> Consultant >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >> >> >> >> -- >> Regards, >> Atul Saroha >> >> *Lead Software Engineer | CAMS* >> >> M: +91 8447784271 >> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18, >> Udyog Vihar Phase IV,Gurgaon, Haryana, India >> >> > > > -- > Regards, > Atul Saroha > > *Lead Software Engineer | CAMS* > > M: +91 8447784271 > Plot #362, ASF Center - Tower A, 1st Floor, Sec-18, > Udyog Vihar Phase IV,Gurgaon, Haryana, India > > -- ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com