Jonathan Thornburg wrote:
And a related question:  I have a pool of ~10 external USB3 backup
disks (all consumer-grade WD or Seagate 2.5" spinning rust, either
2TB or 4TB capacity each), all currently setup with FFS2 filesystems
on top of softraid crypto (/bioctl -c C/).  Each backup is to a single
disk, written with (roughly speaking)
   rsync -aHESvv --delete/home/  /mnt/home/
Each disk thus has slightly different contents depending on how
recently I did a backup to that disk, but the vast majority of the
files (those that haven't changed recently) should be identical
across disks.
>
> [...]
Thinking about how to detect/correct bit-rot in these backups

I am glad you started this thread.

First of all, the most outrageous threat against your data isn't silent bitrot. It is accidents. In my experience, I am way more likely to lose a file because I delete it by accident, or because I decide I no longer want it and regret trashing it a week after.

That said, my backup strategy (for desktops) is as follows:

I create a checksum list of all my files. Something like:

cd /home/user/documents
find . -type f ! -name '*.md5' -print0 | xargs -0 md5 -r | sort -k 2 > checksums_`date +%Y-%m-%d`.md5

For big datasets this is time-intensive.

When time comes to make a backup, I run the command above and diff the recently created checksum list with the previous checksum list to manually verify changes:

diff $old_checksum $new_checksum | less

This is because it is important to ensure your files are good before you back them up. You don't want to commit bad data to backup storage.

I use the restic backup tool (in ports) to send the data to backup storage. I use it to commit to both sftp and NFS based repositories, but you can do the same with external media too.

restic -r /some/repository backup --exclude-file=restic_exclude.txt /

Restic has a number of advantages. First, it supports encrypted backups so you can export your data to unsafe locations (such as cheap NAS appliances). Restic takes incremental snapshots and keeps them on record until you delete them, so the same repository can be used to recover into any point in time for which you took a snapshot. Restic also has integrated repository testing you can use to ensure your repositories are sound

restic -r /some/repository check # Ensure index integrity

restic -r /some/repository check --read-data=true # Time-consuming verification of EVERYTHING.

The downsides are that this method is only very good for files that don't change that much (otherwise, diffing the checksums generates too much noise) and that it makes you depend on a non standard tool for your recovers.

Reply via email to