I agree we need to do a better job and wording this so people can understand what is happening.
For your exact example here, you are actually looking at too broad of a thing. The exact requirements are not at the full cluster level, but actually at the “token range” level at which repair operates, a given token range needs to have repair start and complete within the gc_grace sliding window. For your example of a repair cycle that takes 5 days, and is started every 7 days, assuming you are performing that cycles in the same order around the nodes every time, a given node will have been repaired within 7 days, even though the start of repair 1 to the finish of repair 2 was more than 7 days. The start of “token ranges repaired on day 0” to the finish of “token ranges repaired on day 7” is less than the gc_grace window. -Jeremiah Jordan On May 16, 2025 at 2:03:00 PM, Mike Sun <m...@msun.io> wrote: > The wording is subtle and can be confusing... > > It's important to distinguish between: > 1. "You need to start and complete a repair within any gc_grace_seconds > window" > 2. "You need to start and complete a repair within gc_grace_seconds" > > #1 is a sliding time window for any time interval in which the tombstone > (tombstone_created_time is written and the expiration of > it (tombstoned_created_time + gc_grace_seconds) > > #2 is a duration bound for the repair time > > My post is saying that to ensure the #1 requirement, you actually need to > "start and complete two consecutive repairs within gc_grace_seconds" > > > On Fri, May 16, 2025 at 2:49 PM Mike Sun <m...@msun.io> wrote: > >> > You need to *start and complete* a repair within any gc_grace_seconds >> window. >> Exactly this. And since "any gc_grace_seconds" does not mean "any >> gc_grace_window from which a repair starts"... the requirement needs to be >> that the duration to "start and complete" two consecutive full repairs is >> within gc_grace_seconds"... that will ensure a repair "starts and >> completes" within "any gc_grace_seconds" window >> >> >> >> On Fri, May 16, 2025 at 2:43 PM Mick Semb Wever <m...@apache.org> wrote: >> >>> . >>> >>> >>>> e.g., assume gc_grace_seconds=10 days, a repair takes 5 days to run >>>> * Day 0: Repair 1 starts and processes token A >>>> * Day 1: Token A is deleted resulting in Tombstone A that will expire >>>> on Day 11 >>>> * Day 5: Repair 1 completes >>>> * Day 7: Repair 2 starts >>>> * Day 11: Tombstone A expires without being repaired >>>> * Day 12: Repair 2 repairs Token A and completes >>>> >>> >>> >>> You need to *start and complete* a repair within any gc_grace_seconds >>> window. >>> In your example no repair started and completed in the Day 1-11 window. >>> >>> We do need to word this better, thanks for pointing it out Mike. >>> >>