Leslie Rhorer wrote: > In this context what, exactly, is de-duplication? I fail to see how > any meaningful interpretation of the term is salient to backups. To > compression, yes, to symbolic interpretation, surely, and to saving > space on a drive and reducing access times, you bet. To backups? I > don't really see it, unless you mean hard-link handling, which it does > most admirably. Soft links, of course, are fairly straightforward. DAR > does handle sparse files exceedingly well. >
Imagine you have classical backup: daily incrementals, full weekly and full monthly. Imagine you have retention for the full weekly 3 (until end of month) and for full monthly 12 (until end of year). You have to maintain 15 full backups and the 6 daily incrementals. How much space is it, that you need for your backup storage? This is why the question what is your active size. No imagine I have 2TB of data, even if I compress this data - lets say with avg. 60% ratio it is 800GB per full backup. 15 copies + means 12TB+. Of course if you have video/audio like mp3, it is already compressed and ratio for the backup compression goes down and space needed up. Now here comes the trick with deduplication. The backup system makes one full backup (800GB) and then keeps track of the bits that changed (it is not that simple, but for the example). Only they are being backuped. Some systems provide ratio of 90%. So to keep your 15+ copies with deduplication ratio of ~80% you need about 3TB. >> May I ask what is your active disk size > > What do you mean by "active" disk size? In each of my main arrays > there are 8 spindles of 8 Terabytes each. Six spindles worth are > encoded with flat data and 2 spindles worth with parity. RAID 6 does > not assign any disks specifically for data or for parity as RAID 3 and > RAID 4 do. Instead, with both RAID 5 and RAID 6, parity is distributed > across every drive, and the data is also distributed across all the > drives, interleaved with the parity. All put together, the available > volume size is 46.9 Terabytes (43.6 Teribytes) after formatting. The > main server currently has 22 Terabytes of data on it. The backup server > is effectively full. > So you have a perfect candidate for deduplication :) because I guess you can keep only few copies of that size on the backup server. Live example here one of the servers with borg the backup archive ------------------------------------------------------------------------------ Original size Compressed size Deduplicated size All archives: 5.47 TB 3.37 TB 483.58 GB Unique chunks Total chunks Chunk index: 2342114 23644792 The last monthly Archive name: 2020-07-04T22:01:21 Archive fingerprint: xxxxxxxxxxxxxxxxx Comment: Hostname: xxxx Username: xxxx Time (start): Sat, 2020-07-04 22:01:32 Time (end): Sat, 2020-07-04 23:28:51 Duration: 1 hours 27 minutes 19.92 seconds Number of files: 3416089 Command line: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Utilization of maximum supported archive size: 1% ------------------------------------------------------------------------------ Original size Compressed size Deduplicated size This archive: 807.19 GB 493.13 GB 14.90 GB All archives: 5.47 TB 3.37 TB 483.58 GB Unique chunks Total chunks Chunk index: 2342114 23644792