On Tue, Mar 24, 2009 at 03:11:17PM +0100, Jerome Warnier <jwarn...@beeznest.net> wrote: > Mike Hommey wrote: > > On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier > > <jwarn...@beeznest.net> wrote: > > > >> Giacomo A. Catenazzi wrote: > >> > >>> Jerome Warnier wrote: > >>> > >>>> Raphael Hertzog wrote: > >>>> > >>>>> On Tue, 24 Mar 2009, Jerome Warnier wrote: > >>>>> > >>>>> > >>>>>> For files from packages, though, deduplication might be a good > >>>>>> idea, as > >>>>>> dpkg is supposedly the only one to ever modify the files (under > >>>>>> /usr for > >>>>>> example). > >>>>>> I don't know however how dpkg treats hardlinks. Does it "break" the > >>>>>> hardlink before replacing a file or does it replace the file whatever > >>>>>> its real nature is? > >>>>>> > >>>>>> > >>>>> IIRC dpkg preserves hardlinks inside a binary package but I don't > >>>>> see how > >>>>> it could do the same across multiple binary packages. > >>>>> > >>>>> > >>>> Oh, I didn't expect it to. I just wanted to know its behaviour when it > >>>> upgrades a package. > >>>> Before the upgrade, the file is a hardlink (because I hardlinked it > >>>> manually), then it tries to upgrade the file/hardlink. Does it "break" > >>>> the hardlink* before upgrading the file or does it overwrite the > >>>> file/hardlink and all of its "siblings"? > >>>> > >>> Do you really care? (not theoretically, but in normal use). > >>> I would expect that same content will be delivered: > >>> - by "brother" packages (same source), thus usually updated > >>> at the same time. > >>> - in documentation (so maybe not so important for your use). > >>> > >>> I think the most problem are in files outside "dpkg" control, > >>> i.e. /var and /etc. > >>> > >>> I'm just curious: do you have a list of "same" content files? > >>> maybe I'm completely wrong. > >>> > >> Here you are, for /usr on a typical Lenny AMD64 server (generated with > >> "finddup -n" from package perforate): > >> http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz > >> > > > > $ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}' > > 33142129 > > > > You would free 33MB. How big is your disk ? Is it worth bothering ? > > > I'm not an awk god, but isn't that supposed to just be the total size of > the files it could take if deduplicated? > In this case, it is not the size I would reclaim, as there are sometimes > up to 4 copies of the same content.
the "*(NF-2)" part takes care of those copies. Mike -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org