On Mon, 2022-11-07 at 16:29 +0100, hede wrote: > Am 07.11.2022 02:57, schrieb hw: > > Hi, > > > > Is there no VDO in Debian, and what would be good to use for > > deduplication with > > Debian? Why isn't VDO in the stardard kernel? Or is it? > > I have used vdo in Debian some time ago and didn't remember big > problems. AFAIR I did compile it myself - no prebuild packages.
Cool, I could give that a try, ty. > I switched to btrfs for other reasons. Not even for performance. The VDO > Layer eats performance, yes, but compared to naked ext4 even btrfs is > slow. Really? I never noticed that btrfs would be slow. But then, it's been a long time that I used ext4 ... > > There is no point in > > deduplicating > > backups after they're done because I don't need to save disk space for > > them when > > I can fit them in the first place. > > That's only one point. What are the others? > And it's not really some valid one, I think, as > you do typically not run into space problems with one single action > (YMMV). Running multiple sessions and out-of-band deduplication between > them works for me. That still requires you to have enough disk space for at least two full backups. I can see it working for three backups because you can deduplicate the first two, but not for two. And why would I deduplicate when I have sufficient disk space. > In-band deduplication (that's the one you want) has some drawbacks, too: > High Ressource usage. You need plenty of RAM (up to several Gigabytes > per Terabyte Storage) and write success is delayed (-> slow direct i/o). Well, if it takes 5 days or so to make a backup, that won't be very useful. It takes more than long enough already because my discs can only sustain so much. > For Out-of-Band deduplication there are multiple different > implementations. File based dedup on directory basis can be very fast > and resource economical, for example via rdfind or jdupes. Block based > like via bees for btrfs (that's the one I use) is more close to in-band > deduplication (including high RAM usage). Bees can be switched off and > on at any time (for example if it's a small home-system which runs more > demanding tasks from time to time) and switching it on again resumes at > the last state (it starts at the last transaction id which was processed > -> btrfs knows its transactions). Hm. I wouldn't mind running it from time to time, though I don't know that I would have a lot of duplicate data other than backups. How much space might I expect to gain from using bees, and how much memory does it require to run?