Hello,

Our setup is as follows:

Apache Cassandra: 3.0.17
Cassandra Reaper: 1.3.0-BETA-20180830
Compaction: {
       'class': 'TimeWindowCompactionStrategy',
       'compaction_window_size': '30',
       'compaction_window_unit': 'DAYS'
     }

We have two column families which differ only in the way data is written:
one is always with a TTL (of 2 years), the other -- without a TTL.  The
data is time-series-like, append-only, no explicit updates or deletes.  The
data goes back as far as ~15 months.

We have scheduled a non-incremental repair using Cassandra Reaper to run
every week.

Now we are observing an unexpected effect such that often *all* of the
SSTable files on disk are modified (touched by repair) for both of the TTLd
and non-TTLd tables.

This is not expected, since the old files from past months have been
repeatedly repaired a number of times already.

If it is an effect caused by over-streaming, why does Cassandra find any
differences in the files from past months in the first place?  We expect
that after a file from 2 months ago (or earlier) has been fully repaired
once, there is no possibility for any more differences to be discovered.

Is this not a reasonable assumption?

Regards,,
-- 
Alex

Reply via email to