Do you see any pending AntiEntropySessions (not AntiEntropyStage) with nodetool 
tpstats on nodes?
Romain
 

    Le Mercredi 21 septembre 2016 16h45, "Li, Guangxing" 
<guangxing...@pearson.com> a écrit :
 

 Alain,
my script actually grep through all the log files, including those 
system.log.*. So it was probably due to a failed session. So now my script 
assumes the repair has finished (possibly due to failure) if it does not see 
any more repair related logs after 2 hours.
Thanks.
George.
On Wed, Sep 21, 2016 at 3:03 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

Hi George,
That's the best way to monitor repairs "out of the box" I could think of. When 
you're not seeing 2048 (in your case), it might be due to log rotation or to a 
session failure. Have you had a look at repair failures?

I am wondering why the implementor did not put something in the log (e.g. ... 
Repair command #41 has ended...) to clearly state that the repair has completed.

+1, and some informations about ranges successfully repaired and the ranges 
that failed could be a very good thing as well. It would be easy to then read 
the repair result and to know what to do next (re-run repair on some ranges, 
move to the next node, etc).

2016-09-20 17:00 GMT+02:00 Li, Guangxing <guangxing...@pearson.com>:

Hi,
I am using version 2.0.9. I have been looking into the logs to see if a repair 
is finished. Each time a repair is started on a node, I am seeing log line like 
"INFO [Thread-112920] 2016-09-16 19:00:43,805 StorageService.java (line 2646) 
Starting repair command #41, repairing 2048 ranges for keyspace groupmanager" 
in system.log. So I know that I am expecting to see 2048 log lines like "INFO 
[AntiEntropySessions:109] 2016-09-16 19:27:20,662 RepairSession.java (line 282) 
[repair #8b910950-7c43-11e6-88f3-f147e a74230b] session completed 
successfully". Once I see 2048 such log lines, I know this repair has 
completed. But this is not dependable since sometimes I am seeing less than 
2048 but I know there is no repair going on since I do not see any trace of 
repair in system.log for a long time. So it seems to me that there is a clear 
way to tell that a repair has started but there is no clear way to tell a 
repair has ended. The only thing you can do is to watch the log and if you do 
not see repair activity for a long time, the repair is done somehow. I am 
wondering why the implementor did not put something in the log (e.g. ... Repair 
command #41 has ended...) to clearly state that the repair has completed.
Thanks.
George.
On Tue, Sep 20, 2016 at 2:54 AM, Jens Rantil <jens.ran...@tink.se> wrote:

On Mon, Sep 19, 2016 at 3:07 PM Alain RODRIGUEZ <arodr...@gmail.com> wrote:

...
- The size of your data- The number of vnodes- The compaction throughput- The 
streaming throughput- The hardware available- The load of the cluster- ...

I've also heard that the number of clustering keys per partition key could have 
an impact. Might be worth investigating.
Cheers,Jens -- 
Jens Rantil
Backend Developer @ TinkTink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.







   

Reply via email to