3 - Now on to the subject of backing up a system. I am using rsync as
my primary backup tool. Celejar, thank you for suggesting borg. I
did install it and look at it briefly. But for several reasons, I'm
not sure I want to use that right now.
It is of course one more thing to learn, use, and maintain.
I get the sense that it stores data as a mass of chunks, rather than
as individual files, so if something goes wrong it could really go
wrong? It is newer, and perhaps less proven than some older
alternatives.
Hi, I would spend some words for borgbackup. borgbackup is an amazing
tool (all of us know that it does).
Don't worry about deduplication. Deduplication is a feature.
Deduplication currently is taking great place in many envs because it is
a secure practices if well done and the amount of data is increased in
last years.
The problem you stated about "if something goes wrong" is a real problem
today and loosing data is bad. I would say that things could go wrong
using zfs, vdo, rsync + hardlink, borgbackup, restic and any other
tool/fs that provides deduplication, all in the same way. Suppose that
you have a broken deduped chuck, all files pointing to this chunk are
faulty. This type of problem could happen also with any devices that
encounter bad blocks, bit rotting and so on. Deduplication is a features
that permit you to save space. "If something should go bad, It will go
bad with/without deduplication" (the biggest errors are human errors)
and you will know that your backup is really working well when you
try/need restore (this is why a periodical restore test is needed). You
could not use deduplication but you could lost all data because your
drive dies. So nothing changed, it is a features. About how borgbackup
saves data inside the repository I would say that you can encounter this
problem also using tool like bacula, bareos or with rsync + hardlink.
Imagine that using bacula you saved 3 backup cycles and having 3 full
backups of 1TB plus 30 daily incremental for each full backup (monthly
cycle). If the volume that contains the full backup get faulty, all
backups based on this full backup are not usefull, so the first backup
cycle is incosistent and red flagged. To avoid problem with data
corruption, you should use raid, better hardware, ECC ram to avoid data
corruption.
Compressed archives have the same problem so you should not compress
your data.
In the case of borgbackup, you have the ability to perform checks on
repository and archives inside repository and check if something is broken.
rsync + hardlink is an inefficient way to do deduplication..it works,
save some space on devices but nothing more.(I'm not saying that rsync
is not a very good tool).
Also, the borg website seems to suggest that they will be developing
aggressively, and breaking compatibility over time. Where else have I
heard that lately? (*cough* SystemD *cough* . . .)
About this. Backup is a practice that permits you to protect your data.
If you really care your data you should use a stable system and you
should not use the latest version when released. If your data are
important and you update too much frequently the software, you will
encounter at 100% some problems. This is why stable distribution like
debian, centos, ubuntu and other maintain the same software for many
years with changing the version and push security or critical fixes.
Always about borg when a new release is released the maintainer will
report any incompability with previous version and always if you care
your data you should read notes before any update. This is a good
practice and not only for backup.
And, I always worry that today's free/libre can be tomorrow's
proprietary/unfree.
About this it is not really a problem. If borgbackup will not be open
source/free, someone probably will fork the latest open source version
and that's all. [I know that this is off-topic for debian but this is
only an example, sorry] For example when big blue bought redhat, many
centos users were worried about the future of centos. If big blue (is
some case) will drop centos, the community will fork it without any
problem and again if your preferred software, in this case borgbackup,
will be non-free you could use it until you deploy another solution with
another free/open source software and replace it (or you could pay it
considering as a support donation). I don't think that actually, when
open-source is a business model, that a software, released as
open-source, will change its license. Generally they release open source
software and services like support or other. borgbackup is around since
2010 and many companies are selling storage space service for borgbackup
like rsync.net and other (please don't consider this as spam is only an
example).
Andy, you suggested having off-site backups. You are correct that
local-only backups do not protect against things like fire or theft.
But off-site backups require a place for them to be, the ongoing
effort of getting them there (and back, in a timely manner, if
needed), and some cost involved. And the storage place is also
subject to "disasters" , or just going out of business overnight.
On-line backups have the same problems, in addition to the fact that
once your data is "out there", even if it is encrypted, it can be
"cracked" by governments, corporations, or individuals, at their
leisure.
I'm not sure my threat model really justifies off-site backups at this point.
About remote off-site backup, again if your data are important and
critical for you, you should consider this for 3-2-1 method. If you are
paranoid about your privacy and data, you can avoid public cloud and get
a dedicated server/vps with the necessary storage and provide by
yourself remote storage, it has a cost but if you need it you can do it.
Again if you data are critical off-site backup could save you and
off-site backup is only another good practices.
5 - You also mentioned encrypted backups. Of course, you have to to
trust the encryption methodology. And it should not be such a hassle
that you don't actually use it.
And . . . don't lose your key! That could really ruin your day. Or life.
I have actually considered (and still am) encrypting the backups,
perhaps using full disk encryption. Maybe LUKS or something similar.
But . . . I do remember using disk encryption on the home partition
of my system, years ago. I worked fine for a short time, and then it
just didn't. Not a lost password, it just stopped working. Lost data
forever. Not fun. Never forgot that.
About encryption, this is a feature. With borgbackup you can disable it
(like compression). I always avoided encrypting my files due to loss
risk, but if you need an off-site backup this is a good feature. Do you
trust all encryption methodology of every protocol/software that you
use? Crypto can be strong today and weak tomorrow..simply adopt new
methods when something becomes "obsolete" or deprecated.
Another word about borgbackup. Borgbackup works in a similar way to
rsync but rsync was born as sync tool (and after used as transfer method
in backup script based on rsync) and borgbackup was born as a backup
tool. When borg run through ssh to a remote repository, borg is needed
on the destination host (like rsync) to perform operations. If you don't
trust push method (where borgbackup and ssh permit some option to save
your security) you could use sshfs (or similar) and run in pull mode.
My short story: I'm migrating away from bacula to borgbackup using it as
a central server that runs command on target host using ssh keys. To do
this you can use ansible or a simple ssh script. Why? bacula is great
software but too much complicated over time and borgbackup solve me some
problems (like encryption, deduplication,offsite backup and backup speed
== less time).
Ah, I'm not an expert about borgbackup, but it seems very useful when
working with system like ansible to run backup on a great number of
hosts because it has not the barriers that tools like bacula have due to
the their client/server model (director/storage).
Suggestion: if you want use it, try it well and see if suits your needs
and (sorry for this) before replace your current solution makes
appropriate test. It is not the best backup tool on the world and it is
not enteprise but hey, it is amazing and do the job very well. Give only
a try.