If that's the case, it sounds like a bug to me. A SSTable file in the
snapshot should never be modified by Cassandra, as that may interfere
with some tools backing up data from the snapshots.
On 18/03/2022 19:15, James Brown wrote:
This in 4.0.3 after running |nodetool snapshot| that we're seeing
sstables change, yes.
James Brown
Infrastructure Architect @ easypost.com <http://easypost.com>
On 2022-03-18 at 12:06:00, Jeff Jirsa <jji...@gmail.com> wrote:
This is nodetool snapshot yes? 3.11 or 4.0?
In versions prior to 3.0, sstables would be written with -tmp- in the
name, then renamed when complete, so an sstable definitely never
changed once it had the final file name. With the new transaction log
mechanism, we use one name and a transaction log to note what's in
flight and what's not, so if the snapshot system is including
sstables being written (from flush, from compaction, or from
streaming), those aren't final and should be skipped.
On Fri, Mar 18, 2022 at 11:46 AM James Brown <jbr...@easypost.com> wrote:
We use the boring combo of cassandra snapshots + tar to backup
our cassandra nodes; every once in a while, we'll notice tar
failing with the following:
tar:
data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
file changed as we read it
I find this a bit perplexing; what would cause an sstable inside
a snapshot to change? The only thing I can think of is an
incremental repair changing the "repaired_at" flag on the
sstable, but it seems like that should "un-share" the hardlinked
sstable rather than running the risk of mutating a snapshot.
James Brown
Cassandra admin @ easypost.com <http://easypost.com>