Ian, In my experience I don’t get any output from repair (2.0.7) that is useful until the keyspace is finished. Perhaps this has been solved but we do something much more painful:
We tail the log on the node having repair run on it, watching for the first repair session, and then count each “session completed” line. Each keyspace being repaired will produce num_tokens worth of messages. Find the start time: $grep AntiEntropy /var/log/cassandra/system.log | grep –m 1 "new session" INFO [AntiEntropySessions:1] 2015-01-06 08:00:01,817 RepairSession.java (line 244) [repair #1c1023c0-95b0-11e4-abc7-9d8c76a06ae7] new session: will sync /10.x.y.z, /10.x.y.z on range (2770269247941187446,2771538486312712323] for menomena.[x, y, z] Note – you have to catch the *first* message, there will be more to follow. This is something that would be great if there was a differentiator in the log output to know if it is the initial start of a repair vs a new range. So start_time = 2015-01-06 08:00:01,817 From there you count session completed messages: $grep AntiEntropy /var/log/cassandra/system.log | grep "session completed" | wc -l INFO [AntiEntropySessions:192] 2015-01-06 14:35:13,874 RepairSession.java (line 282) [repair #1c1023c0-95b0-11e4-abc7-9d8c76a06ae7] session completed successfully Since I have num_tokens=256; If I see a count of 412, I know that OpsCenter(256) is finished and menomena(256) is about 40% finished. As Jan said, you could then use this to calculate remaining time from the start time and the remainder of the ranges. I’ve found this to give me immediate indication of progress, rather than having to wait for the keyspace to be finished. We are running 2.0.7, maybe some of this has been exposed through nodetool repair (which would be sweet). This seems to be more or less accurate, but anyone correct me if I am wrong please. We use this more for automatically detecting long running repairs more than to simply watch progress, which our internal zabbix server will whine about it to my team. Jason Kushmaul | V.P. Mobile Engineering 4050 Hunsaker Drive | East Lansing, MI 48823 USA 517-337-2701 x 5225| 517-337-2754 (fax) From: Jan [mailto:cne...@yahoo.com] Sent: Thursday, March 19, 2015 4:04 PM To: user@cassandra.apache.org Subject: Re: best way to measure repair times? Ian; to respond to your specific question: You could pipe the output of your repair into a file and subsequently determine the time taken. example: nodetool repair -dc DC1 [2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system' [2014-07-24 21:59:55,617] Starting repair command #2, repairing 490 ranges for keyspace system_traces (seq=true, full=true) [2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-c972e09793ca for range (820981369067266915,822627736366088177] finished [2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-c972e09793ca for range (2506042417712465541,2515941262699962473] finished What to look for: a) Look for the specific name of the Keyspace & the word 'starting repair' b) Look for the word 'finished'. c) Compute the average time per keyspace and you would be able to have a rough idea of how long your repairs would take on a regular basis. This is only for continual operational repair, not the first time its done. hope this helps Jan/ On Thursday, March 19, 2015 12:55 PM, Paulo Motta <pauloricard...@gmail.com<mailto:pauloricard...@gmail.com>> wrote: From: http://www.datastax.com/dev/blog/modern-hinted-handoff Repair and the fine print At first glance, it may appear that Hinted Handoff lets you safely get away without needing repair. This is only true if you never have hardware failure. Hardware failure means that 1. We lose “historical” data for which the write has already finished, so there is nothing to tell the rest of the cluster exactly what data has gone missing 2. We can also lose hints-not-yet-replayed from requests the failed node coordinated With sufficient dedication, you can get by with “only run repair after hardware failure and rely on hinted handoff the rest of the time,” but as your clusters grow (and hardware failure becomes more common) performing repair as a one-off special case will become increasingly difficult to do perfectly. Thus, we continue to recommend running a full repair weekly. 2015-03-19 16:42 GMT-03:00 Robert Coli <rc...@eventbrite.com<mailto:rc...@eventbrite.com>>: On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar <ali.rac...@gmail.com<mailto:ali.rac...@gmail.com>> wrote: Cassandra doesn't guarantee eventual consistency? If you run regularly scheduled repair, it does. If you do not run repair, it does not. Hinted handoff, for example, is considered an optimization for repair, and does not assert that it provides a consistency guarantee. =Rob http://twitter.com/rcolidba -- Paulo Ricardo -- European Master in Distributed Computing Royal Institute of Technology - KTH Instituto Superior Técnico - IST http://paulormg.com<http://paulormg.com/>