Hi,
I can confirm the same issue in Cassandra 3.11.2.

As an example:  a TWCS table that normally has 800 SSTables  (2 years'
worth of daily windows plus some anticompactions) will peak at
anywhere from 15k to 50k SSTables during a subrange repair.


Regards,

Martin
On Mon, Sep 24, 2018 at 9:34 AM Oleksandr Shulgin
<oleksandr.shul...@zalando.de> wrote:
>
> Hello,
>
> Our setup is as follows:
>
> Apache Cassandra: 3.0.17
> Cassandra Reaper: 1.3.0-BETA-20180830
> Compaction: {
>        'class': 'TimeWindowCompactionStrategy',
>        'compaction_window_size': '30',
>        'compaction_window_unit': 'DAYS'
>      }
>
> We have two column families which differ only in the way data is written: one 
> is always with a TTL (of 2 years), the other -- without a TTL.  The data is 
> time-series-like, append-only, no explicit updates or deletes.  The data goes 
> back as far as ~15 months.
>
> We have scheduled a non-incremental repair using Cassandra Reaper to run 
> every week.
>
> Now we are observing an unexpected effect such that often *all* of the 
> SSTable files on disk are modified (touched by repair) for both of the TTLd 
> and non-TTLd tables.
>
> This is not expected, since the old files from past months have been 
> repeatedly repaired a number of times already.
>
> If it is an effect caused by over-streaming, why does Cassandra find any 
> differences in the files from past months in the first place?  We expect that 
> after a file from 2 months ago (or earlier) has been fully repaired once, 
> there is no possibility for any more differences to be discovered.
>
> Is this not a reasonable assumption?
>
> Regards,,
> --
> Alex
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to