hi Paulo, 
one more follow up ... :)
 I noticed these tables are suppose to replicatd to all nodes in the cluster, 
and it is not per node specific. 
how does it work when repair job targeting only local vs all DC? is there any 
columns or flag i can tell the difference? or does it actualy matter?
 thanks



Sent from my iPhone

> On Feb 25, 2016, at 10:37 AM, Paulo Motta <pauloricard...@gmail.com> wrote:
> 
> > why each job repair execution will have 2 entries? I thought it will be one 
> > entry, begining with started_at column filled, and when it completed, 
> > finished_at column will be filled. 
> 
> that's correct, I was mistaken!
> 
> > Also, if my cluster has more than 1 keyspace, and the way this table is 
> > structured, it will have multiple entries, one for each keysapce_name 
> > value. no ? thanks
> 
> right, because repair sessions in different keyspaces will have different 
> repair session ids.
> 
> 2016-02-25 15:04 GMT-03:00 Jimmy Lin <y2k...@gmail.com>:
>> hi Paulo, 
>> follow up on the # of entries question... 
>>  why each job repair execution will have 2 entries? I thought it will be one 
>> entry, begining with started_at column filled, and when it completed, 
>> finished_at column will be filled. 
>> Also, if my cluster has more than 1 keyspace, and the way this table is 
>> structured, it will have multiple entries, one for each keysapce_name value. 
>> no ? thanks
>> 
>> 
>> Sent from my iPhone
>> 
>>> On Feb 25, 2016, at 5:48 AM, Paulo Motta <pauloricard...@gmail.com> wrote:
>>> 
>>> Hello Jimmy,
>>> 
>>> The parent_repair_history table keeps track of start and finish information 
>>> of a repair session.  The other table repair_history keeps track of repair 
>>> status as it progresses. So, you must first query the parent_repair_history 
>>> table to check if a repair started and finish, as well as its duration, and 
>>> inspect the repair_history table to troubleshoot more specific details of a 
>>> given repair session.
>>> 
>>> Answering your questions below:
>>> 
>>> > Is every invocation of nodetool repair execution will be recorded as one 
>>> > entry in parent_repair_history CF regardless if it is across DC, local 
>>> > node repair, or other options ?
>>> 
>>> Actually two entries, one for start and one for finish.
>>> 
>>> > A repair job is done only if "finished" column contains value? and a 
>>> > repair job is successfully done only if there is no value in exce 
>>> > ption_messages or exception_stacktrace ?
>>> 
>>> correct
>>> 
>>> > what is the purpose of successful_ranges column? do i have to check they 
>>> > are all matched with requested_range to ensure a successful run?
>>> 
>>> correct
>>> 
>>> -
>>> > Ultimately, how to find out the overall repair health/status in a given 
>>> > cluster?
>>> 
>>> Check if repair is being executed on all nodes within gc_grace_seconds, and 
>>> tune that value or troubleshoot problems otherwise.
>>> 
>>> > Scanning through parent_repair_history and making sure all the known 
>>> > keyspaces has a good repair run in recent days?
>>> 
>>> Sounds good.
>>> 
>>> You can check https://issues.apache.org/jira/browse/CASSANDRA-5839 for more 
>>> information.
>>> 
>>> 
>>> 2016-02-25 3:13 GMT-03:00 Jimmy Lin <y2klyf+w...@gmail.com>:
>>>> 
>>>> hi all,
>>>> few questions regarding how to read or digest the 
>>>> system_distributed.parent_repair_history CF, that I am very intereted to 
>>>> use to find out our repair status... 
>>>>  
>>>> -
>>>> Is every invocation of nodetool repair execution will be recorded as one 
>>>> entry in parent_repair_history CF regardless if it is across DC, local 
>>>> node repair, or other options ?
>>>> 
>>>> -
>>>> A repair job is done only if "finished" column contains value? and a 
>>>> repair job is successfully done only if there is no value in exce
>>>> ption_messages or exception_stacktrace ?
>>>> what is the purpose of successful_ranges column? do i have to check they 
>>>> are all matched with requested_range to ensure a successful run?
>>>> 
>>>> -
>>>> Ultimately, how to find out the overall repair health/status in a given 
>>>> cluster?
>>>> Scanning through parent_repair_history and making sure all the known 
>>>> keyspaces has a good repair run in recent days?
>>>> 
>>>> ---------------
>>>> CREATE TABLE system_distributed.parent_repair_history (
>>>>     parent_id timeuuid PRIMARY KEY,
>>>>     columnfamily_names set<text>,
>>>>     exception_message text,
>>>>     exception_stacktrace text,
>>>>     finished_at timestamp,
>>>>     keyspace_name text,
>>>>     requested_ranges set<text>,
>>>>     started_at timestamp,
>>>>     successful_ranges set<text>
>>>> )
> 

Reply via email to