Romain, I started running a new repair. If I see such behavior again, I will try what you mentioned.
Thanks. On Wed, Sep 21, 2016 at 9:51 AM, Romain Hardouin <romainh...@yahoo.fr> wrote: > Do you see any pending AntiEntropySessions (not AntiEntropyStage) with > nodetool tpstats on nodes? > > Romain > > > Le Mercredi 21 septembre 2016 16h45, "Li, Guangxing" < > guangxing...@pearson.com> a écrit : > > > Alain, > > my script actually grep through all the log files, including those > system.log.*. So it was probably due to a failed session. So now my script > assumes the repair has finished (possibly due to failure) if it does not > see any more repair related logs after 2 hours. > > Thanks. > > George. > > On Wed, Sep 21, 2016 at 3:03 AM, Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > > Hi George, > > That's the best way to monitor repairs "out of the box" I could think of. > When you're not seeing 2048 (in your case), it might be due to log rotation > or to a session failure. Have you had a look at repair failures? > > I am wondering why the implementor did not put something in the log (e.g. > ... Repair command #41 has ended...) to clearly state that the repair has > completed. > > > +1, and some informations about ranges successfully repaired and the > ranges that failed could be a very good thing as well. It would be easy to > then read the repair result and to know what to do next (re-run repair on > some ranges, move to the next node, etc). > > > 2016-09-20 17:00 GMT+02:00 Li, Guangxing <guangxing...@pearson.com>: > > Hi, > > I am using version 2.0.9. I have been looking into the logs to see if a > repair is finished. Each time a repair is started on a node, I am seeing > log line like "INFO [Thread-112920] 2016-09-16 19:00:43,805 > StorageService.java (line 2646) Starting repair command #41, repairing 2048 > ranges for keyspace groupmanager" in system.log. So I know that I am > expecting to see 2048 log lines like "INFO [AntiEntropySessions:109] > 2016-09-16 19:27:20,662 RepairSession.java (line 282) [repair > #8b910950-7c43-11e6-88f3-f147e a74230b] session completed successfully". > Once I see 2048 such log lines, I know this repair has completed. But this > is not dependable since sometimes I am seeing less than 2048 but I know > there is no repair going on since I do not see any trace of repair in > system.log for a long time. So it seems to me that there is a clear way to > tell that a repair has started but there is no clear way to tell a repair > has ended. The only thing you can do is to watch the log and if you do not > see repair activity for a long time, the repair is done somehow. I am > wondering why the implementor did not put something in the log (e.g. ... > Repair command #41 has ended...) to clearly state that the repair has > completed. > > Thanks. > > George. > > On Tue, Sep 20, 2016 at 2:54 AM, Jens Rantil <jens.ran...@tink.se> wrote: > > On Mon, Sep 19, 2016 at 3:07 PM Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > > ... > > - The size of your data > - The number of vnodes > - The compaction throughput > - The streaming throughput > - The hardware available > - The load of the cluster > - ... > > > I've also heard that the number of clustering keys per partition key could > have an impact. Might be worth investigating. > > Cheers, > Jens > -- > Jens Rantil > Backend Developer @ Tink > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden > For urgent matters you can reach me at +46-708-84 18 32. > > > > > > >