Hello Jimmy,

The parent_repair_history table keeps track of start and finish information
of a repair session.  The other table repair_history keeps track of repair
status as it progresses. So, you must first query the parent_repair_history
table to check if a repair started and finish, as well as its duration, and
inspect the repair_history table to troubleshoot more specific details of a
given repair session.

Answering your questions below:

> Is every invocation of nodetool repair execution will be recorded as one
entry in parent_repair_history CF regardless if it is across DC, local node
repair, or other options ?

Actually two entries, one for start and one for finish.

> A repair job is done only if "finished" column contains value? and a
repair job is successfully done only if there is no value in exce
ption_messages or exception_stacktrace ?

correct

> what is the purpose of successful_ranges column? do i have to check they
are all matched with requested_range to ensure a successful run?

correct

-
> Ultimately, how to find out the overall repair health/status in a given
cluster?

Check if repair is being executed on all nodes within gc_grace_seconds, and
tune that value or troubleshoot problems otherwise.

> Scanning through parent_repair_history and making sure all the known
keyspaces has a good repair run in recent days?

Sounds good.

You can check https://issues.apache.org/jira/browse/CASSANDRA-5839 for more
information.


2016-02-25 3:13 GMT-03:00 Jimmy Lin <y2klyf+w...@gmail.com>:

>
> hi all,
> few questions regarding how to read or digest the
> system_distributed.parent_repair_history CF, that I am very intereted to
> use to find out our repair status...
>
> -
> Is every invocation of nodetool repair execution will be recorded as one
> entry in parent_repair_history CF regardless if it is across DC, local node
> repair, or other options ?
>
> -
> A repair job is done only if "finished" column contains value? and a
> repair job is successfully done only if there is no value in exce
> ption_messages or exception_stacktrace ?
> what is the purpose of successful_ranges column? do i have to check they
> are all matched with requested_range to ensure a successful run?
>
> -
> Ultimately, how to find out the overall repair health/status in a given
> cluster?
> Scanning through parent_repair_history and making sure all the known
> keyspaces has a good repair run in recent days?
>
> ---------------
> CREATE TABLE system_distributed.parent_repair_history (
>     parent_id timeuuid PRIMARY KEY,
>     columnfamily_names set<text>,
>     exception_message text,
>     exception_stacktrace text,
>     finished_at timestamp,
>     keyspace_name text,
>     requested_ranges set<text>,
>     started_at timestamp,
>     successful_ranges set<text>
> )
>

Reply via email to