Hello Jimmy, The parent_repair_history table keeps track of start and finish information of a repair session. The other table repair_history keeps track of repair status as it progresses. So, you must first query the parent_repair_history table to check if a repair started and finish, as well as its duration, and inspect the repair_history table to troubleshoot more specific details of a given repair session.
Answering your questions below: > Is every invocation of nodetool repair execution will be recorded as one entry in parent_repair_history CF regardless if it is across DC, local node repair, or other options ? Actually two entries, one for start and one for finish. > A repair job is done only if "finished" column contains value? and a repair job is successfully done only if there is no value in exce ption_messages or exception_stacktrace ? correct > what is the purpose of successful_ranges column? do i have to check they are all matched with requested_range to ensure a successful run? correct - > Ultimately, how to find out the overall repair health/status in a given cluster? Check if repair is being executed on all nodes within gc_grace_seconds, and tune that value or troubleshoot problems otherwise. > Scanning through parent_repair_history and making sure all the known keyspaces has a good repair run in recent days? Sounds good. You can check https://issues.apache.org/jira/browse/CASSANDRA-5839 for more information. 2016-02-25 3:13 GMT-03:00 Jimmy Lin <y2klyf+w...@gmail.com>: > > hi all, > few questions regarding how to read or digest the > system_distributed.parent_repair_history CF, that I am very intereted to > use to find out our repair status... > > - > Is every invocation of nodetool repair execution will be recorded as one > entry in parent_repair_history CF regardless if it is across DC, local node > repair, or other options ? > > - > A repair job is done only if "finished" column contains value? and a > repair job is successfully done only if there is no value in exce > ption_messages or exception_stacktrace ? > what is the purpose of successful_ranges column? do i have to check they > are all matched with requested_range to ensure a successful run? > > - > Ultimately, how to find out the overall repair health/status in a given > cluster? > Scanning through parent_repair_history and making sure all the known > keyspaces has a good repair run in recent days? > > --------------- > CREATE TABLE system_distributed.parent_repair_history ( > parent_id timeuuid PRIMARY KEY, > columnfamily_names set<text>, > exception_message text, > exception_stacktrace text, > finished_at timestamp, > keyspace_name text, > requested_ranges set<text>, > started_at timestamp, > successful_ranges set<text> > ) >