Thanks Jeff for the quick response. But I believe successfully starting and completing a repair every 7 day is still not enough to guarantee that a tombstone would not expire:
e.g., assume gc_grace_seconds=10 days, a repair takes 5 days to run * Day 0: Repair 1 starts and processes token A * Day 1: Token A is deleted resulting in Tombstone A that will expire on Day 11 * Day 5: Repair 1 completes * Day 7: Repair 2 starts * Day 11: Tombstone A expires without being repaired * Day 12: Repair 2 repairs Token A and completes Yes, practically, full repairs shouldn't take 5 days, but there can be circumstances that could cause repairs to be paused or stopped for periods of times (e.g. adding new nodes to cluster). FWIW, full repairs taking 3 days is was not uncommon thing in my experience. On Fri, May 16, 2025 at 1:57 PM Jeff Jirsa <jji...@gmail.com> wrote: > > > On May 16, 2025, at 10:22 AM, Mike Sun <m...@msun.io> wrote: > > The Cassandra docs > <https://cassandra.apache.org/doc/5.0/cassandra/managing/operating/repair.html> > advise: >> >> At a minimum, repair should be run often enough that the gc grace period >> never expires on unrepaired data. Otherwise, deleted data could reappear. >> With a default gc grace period of 10 days, repairing every node in your >> cluster at least once every 7 days will prevent this, while providing >> enough slack to allow for delays. > > > I don't think repairing at least once every 7 days if gc_grace_seconds is > 10 days is adequate to guarantee no risk of data resurrection. > > I wrote this post to explain my reasoning: > https://msun.io/cassandra-scylla-repairs/ > <https://msun.io/cassandra-scylla-repairs/> > > Would appreciate any feedback, thanks! > Mike Sun > > > > To summarize the blog for those who haven’t read it: > > Running repairs once every gc_grace_seconds is actually insufficient > because it doesn’t account for the duration of the repair process itself > and the specific timing of when data ranges (tokens) are repaired. A > tombstone created for data just after its specific token was scanned by one > repair can expire before the next repair cycle (which only begins > gc_grace_seconds later) manages to reach and process that particular > token. > > You need to complete the repair within the gc_grace_seconds window. Having > repair run for 3 days would be a surprise. We can certainly adjust the > wording, but the intent of that wording isn’t “start it every 7 days > regardless of how often it runs”, it’s “finish it every 7 days” > (successfully). > > > > Yes, it’s not enough to start the repair every 7 days, it needs to > complete successfully between the time the tombstone is written and the > expiration of gc_grace_seconds. > > >