Hi, I can confirm the same issue in Cassandra 3.11.2. As an example: a TWCS table that normally has 800 SSTables (2 years' worth of daily windows plus some anticompactions) will peak at anywhere from 15k to 50k SSTables during a subrange repair.
Regards, Martin On Mon, Sep 24, 2018 at 9:34 AM Oleksandr Shulgin <oleksandr.shul...@zalando.de> wrote: > > Hello, > > Our setup is as follows: > > Apache Cassandra: 3.0.17 > Cassandra Reaper: 1.3.0-BETA-20180830 > Compaction: { > 'class': 'TimeWindowCompactionStrategy', > 'compaction_window_size': '30', > 'compaction_window_unit': 'DAYS' > } > > We have two column families which differ only in the way data is written: one > is always with a TTL (of 2 years), the other -- without a TTL. The data is > time-series-like, append-only, no explicit updates or deletes. The data goes > back as far as ~15 months. > > We have scheduled a non-incremental repair using Cassandra Reaper to run > every week. > > Now we are observing an unexpected effect such that often *all* of the > SSTable files on disk are modified (touched by repair) for both of the TTLd > and non-TTLd tables. > > This is not expected, since the old files from past months have been > repeatedly repaired a number of times already. > > If it is an effect caused by over-streaming, why does Cassandra find any > differences in the files from past months in the first place? We expect that > after a file from 2 months ago (or earlier) has been fully repaired once, > there is no possibility for any more differences to be discovered. > > Is this not a reasonable assumption? > > Regards,, > -- > Alex > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org