If that's the case, it sounds like a bug to me. A SSTable file in the snapshot should never be modified by Cassandra, as that may interfere with some tools backing up data from the snapshots.

On 18/03/2022 19:15, James Brown wrote:
This in 4.0.3 after running |nodetool snapshot| that we're seeing sstables change, yes.

James Brown
Infrastructure Architect @ easypost.com <http://easypost.com>


On 2022-03-18 at 12:06:00, Jeff Jirsa <jji...@gmail.com> wrote:
This is nodetool snapshot yes? 3.11 or 4.0?

In versions prior to 3.0, sstables would be written with -tmp- in the name, then renamed when complete, so an sstable definitely never changed once it had the final file name. With the new transaction log mechanism, we use one name and a transaction log to note what's in flight and what's not, so if the snapshot system is including sstables being written (from flush, from compaction, or from streaming), those aren't final and should be skipped.




On Fri, Mar 18, 2022 at 11:46 AM James Brown <jbr...@easypost.com> wrote:

    We use the boring combo of cassandra snapshots + tar to backup
    our cassandra nodes; every once in a while, we'll notice tar
    failing with the following:

    tar:
    
data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
    file changed as we read it

    I find this a bit perplexing; what would cause an sstable inside
    a snapshot to change? The only thing I can think of is an
    incremental repair changing the "repaired_at" flag on the
    sstable, but it seems like that should "un-share" the hardlinked
    sstable rather than running the risk of mutating a snapshot.


    James Brown
    Cassandra admin @ easypost.com <http://easypost.com>

Reply via email to