Timo Weingärtner <t...@tiwe.de> writes: > Package: wnpp > Severity: wishlist > X-Debbugs-CC: debian-de...@lists.debian.org > > Package name: hadori > Version: 0.2 > Upstream Author: Timo Weingärtner <t...@tiwe.de> > URL: https://github.com/tiwe-de/hadori > License: GPL3+ > Description: Hardlinks identical files > This might look like yet another hardlinking tool, but it is the only one > which only memorizes one filename per inode. That results in less merory > consumption and faster execution compared to its alternatives. Therefore > (and because all the other names are already taken) it's called > "HArdlinking DOne RIght". > . > Advantages over other hardlinking tools: > * predictability: arguments are scanned in order, each first version is kept > * much lower CPU and memory consumption > * hashing option: speedup on many equal-sized, mostly identical files > > The initial comparison was with hardlink, which got OOM killed with a hundred > backups of my home directory. Last night I compared it to duff and rdfind > which would have happily linked files with different st_mtime and st_mode. > > I need a sponsor. I'll upload it to mentors.d.n as soon as I get the bug > number. > > > Greetings > Timo
I've been thinking about the problem of memory consumption too. But I've come to a different solution. One that doesn't need memory at all. Instead of remembering inodes, filenames and checksums create a global cache (e.g. directory hierachy like .cache/<start of hash>/<hash>) and hardlink every file to there. If you want/need to include uid, gid, mtime, mode in there then make that part of the .cache path. Garbage collection in the cache would be removing all files with a link count of 1. Going one step further link files with unique size [uid, gid, mtime, ...] to .cache/<size> and change that into .cache/<size>/<start of hash>/<hash> when you find a second file with the same size that isn't identical. That would save on the expensive hashing of clearly unique files. You could also use a hash that computes the first byte from the first 4k, second byte from 64k, thrid from 1mb and so on. That way you can check if the beginning of 2 files match without having to checksum the whole file or literally comprare the two. MfG Goswin -- To UNSUBSCRIBE, email to debian-wnpp-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87lingkis2.fsf@frosties.localnet