On Sat, Aug 31, 2013 at 12:32:47AM +0100, Dmitrijs Ledkovs wrote: > On 30 August 2013 20:55, Steven Chamberlain <ste...@pyro.eu.org> wrote: > > Hi, > > > >> [...] using git instead of the file system for storing the contents > >> of Debian Code Search. The hope was that it would lead to fewer disk > >> seeks and less data due to gits delta-encoding > > > > Wouldn't ZFS be a more natural way to do something like this? > > > > A choice of gzip, lzjb and more recently lz4 compression; snapshots > > and/or deduplication both reduce the amount of disk blocks and cache > > memory needed. > > > > I've pondered before at this overlap in functionality between packing by > > Git, and those features of the ZFS filesystem. They are doing much the > > same thing but with different granularity. It would be neat if they > > could work together better. > > I haven't finished packaging bedup - btrfs deduplication tool.
bedup is only an userspace tool that calculates per-file hashes, then uses chattr tricks to avoid a race condition if some other process tried to write to the file. git renders the first part not needed: hashes are already known. If you're the only writer, you don't need to care about write races either. > Anybody have benchmarked that, if that's any good and/or comparable to zfs > deduplication? It's an apples to microsofts comparison: zfs takes a massive amount of memory to store block hashes. This has an upside: duplicated data never hits the actual disk, and a downside: the memory cannot be used for anything else, and if hashes hit the disk things become really slow. With btrfs, unless you know the hash beforehand (git), deduplication works after a write. This might be months later (one-shot), during the night (cron) or, if *notify is used, a small fraction of second later. btrfs can enumerate recently changed blocks for you so there's no need to read the whole disk in that cron job. -- ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130831001654.ga16...@angband.pl