[ https://issues.apache.org/jira/browse/CASSANDRA-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955061#comment-17955061 ]
Jon Meredith commented on CASSANDRA-20312: ------------------------------------------ This change is preventing incremental repairs that are not making any progress from being auto-failed as every hour `org.apache.cassandra.repair.consistent.LocalSessions#cleanup()` triggers sending a status check to all participants, which resets that last updated time causing the `repair_fail_timeout` (default 24 hours) to never expire, leaving sessions around that conflict with new incremental repairs. [~mck] you mentioned "This has been confirmed with one user in production to fix the situation of repairs (consistently) prematurely timing out." Were they timing out before the 24 hour timer hit due to dropped messages? If so, did the user try configuring repair retries(CASSANDRA-18816). I think we should revert this change and look for another solution. > Long running repairs autofail prematurely > ----------------------------------------- > > Key: CASSANDRA-20312 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20312 > Project: Apache Cassandra > Issue Type: Bug > Components: Consistency/Repair > Reporter: Berenguer Blasi > Assignee: Berenguer Blasi > Priority: Normal > Fix For: 4.0.18, 4.1.9, 5.0.4, 5.1 > > > Repairs will autofail after a long period of inactivity but very long-running > repairs may be incorrectly auto failed. Capture status pings as liveness info. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org