[ 
https://issues.apache.org/jira/browse/CASSANDRA-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955061#comment-17955061
 ] 

Jon Meredith commented on CASSANDRA-20312:
------------------------------------------

This change is preventing incremental repairs that are not making any progress 
from being auto-failed as every hour 
`org.apache.cassandra.repair.consistent.LocalSessions#cleanup()` triggers 
sending a status check to all participants, which resets that last updated time 
causing the `repair_fail_timeout` (default 24 hours) to never expire, leaving 
sessions around that conflict with new incremental repairs.

[~mck] you mentioned "This has been confirmed with one user in production to 
fix the situation of repairs (consistently) prematurely timing out." Were they 
timing out before the 24 hour timer hit due to dropped messages? If so, did the 
user try configuring repair retries(CASSANDRA-18816).

I think we should revert this change and look for another solution.

> Long running repairs autofail prematurely
> -----------------------------------------
>
>                 Key: CASSANDRA-20312
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20312
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.0.18, 4.1.9, 5.0.4, 5.1
>
>
> Repairs will autofail after a long period of inactivity but very long-running 
> repairs may be incorrectly auto failed. Capture status pings as liveness info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to