Thanks Jeff for the quick response. But I believe successfully starting and
completing a repair every 7 day is still not enough to guarantee that a
tombstone would not expire:

e.g., assume gc_grace_seconds=10 days, a repair takes 5 days to run
* Day 0: Repair 1 starts and processes token A
* Day 1: Token A is deleted resulting in Tombstone A that will expire on
Day 11
* Day 5: Repair 1 completes
* Day 7: Repair 2 starts
* Day 11: Tombstone A expires without being repaired
* Day 12: Repair 2 repairs Token A and completes

Yes, practically, full repairs shouldn't take 5 days, but there can be
circumstances that could cause repairs to be paused or stopped for periods
of times (e.g. adding new nodes to cluster). FWIW, full repairs taking 3
days is was not uncommon thing in my experience.

On Fri, May 16, 2025 at 1:57 PM Jeff Jirsa <jji...@gmail.com> wrote:

>
>
> On May 16, 2025, at 10:22 AM, Mike Sun <m...@msun.io> wrote:
>
> The Cassandra docs
> <https://cassandra.apache.org/doc/5.0/cassandra/managing/operating/repair.html>
>  advise:
>>
>> At a minimum, repair should be run often enough that the gc grace period
>> never expires on unrepaired data. Otherwise, deleted data could reappear.
>> With a default gc grace period of 10 days, repairing every node in your
>> cluster at least once every 7 days will prevent this, while providing
>> enough slack to allow for delays.
>
>
> I don't think repairing at least once every 7 days if gc_grace_seconds is
> 10 days is adequate to guarantee no risk of data resurrection.
>
> I wrote this post to explain my reasoning:
> https://msun.io/cassandra-scylla-repairs/
> <https://msun.io/cassandra-scylla-repairs/>
>
> Would appreciate any feedback, thanks!
> Mike Sun
>
>
>
> To summarize the blog for those who haven’t read it:
>
> Running repairs once every gc_grace_seconds is actually insufficient
> because it doesn’t account for the duration of the repair process itself
> and the specific timing of when data ranges (tokens) are repaired. A
> tombstone created for data just after its specific token was scanned by one
> repair can expire before the next repair cycle (which only begins
> gc_grace_seconds later) manages to reach and process that particular
> token.
>
> You need to complete the repair within the gc_grace_seconds window. Having
> repair run for 3 days would be a surprise. We can certainly adjust the
> wording, but the intent of that wording isn’t “start it every 7 days
> regardless of how often it runs”, it’s “finish it every 7 days”
> (successfully).
>
>
>
> Yes, it’s not enough to start the repair every 7 days, it needs to
> complete successfully between the time the tombstone is written and the
> expiration of gc_grace_seconds.
>
>
>

Reply via email to