Alessandro Baggi wrote: > If I'm not wrong deduplication "is a technique for eliminating duplicate > copies of repeating data". > > I'm not a borg expert and it performs deduplication on data chunk. > > Suppose that you backup 2000 files in a day and inside this backup a > chunk is deduped and referenced by 300 files. If the deduped chunk is > broken I think you will lost it on 300 referenced files/chunks. This is > not good for me. >
Look at the explanation by Linux-Fan. I think it is pretty good. It fits one scenario, however if your backup system (disks or whatever) is broken - it can not be considered as backup system at all. I think deduplication is a great thing nowdays - People need to backup TBs, take care of retention etc. I do not share your concerns at all. > if your main dataset has a broken file, no problem, you can recovery > from backups. > > If your saved deduped chunk is broken all files that has reference to it > could be broken. I think also that the same chunk will be used for > successive backups (always for deduplication) so this single chunk could > be used from backup1 to backupN. > This is not true. > It has also integrity check but don't know if check this. I read also > that integrity check on bigsized dataset could require too much time. > > In my mind a backup is a copy of file in window time and if needed in > another window time another copy could be picked but it could not be a > reference to a previous copy. Today there are people that make backups > on tape (expensive) for reliability. I run backups on disks. Disks are > cheap so compression (that require time in backup and restore) and > deduplication (that add complexity) are not needed for me and they don't > affect really my free disk space because I can add a disk. > I think it depends how far you want to go - how precious is the data. Magnetic disk and tapes can be destroyed by EMP or similar. SSD despite its price can fail and if it fails - it can not recover anything. So ... there are some rules in securely preserving backups - but all of this is very expensive. > Rsnapshot uses hardlink that is similar. > > All this solutions are valid if them fit your needs. You must choose how > important are data inside your backups and if losing a chunk deduped > could make damage to your backup dataset in a timeline. > No unless the corruption is on the backup server, but if it happens ... well you should consider the backup server broken - I do not think it has anything with deduplication. > Ah if you have multiple server to backup, I prefer bacula because can > pull data from hosts and can backup multiple server from the same point > (maybe using for each client a separated bacula-sd daemon with dedicated > storage).