On Fri, Nov 15, 2013 at 01:50:05PM +0000, Jonathan Dowland wrote:
> I'm not sure that making a general rule based on an edge-case is a
> good idea.  Publican is not very popular at all, it's quite likely
> that none of the 70 or so people who have installed it have done
> anything unusual with mounts around /usr.

publican is just an example. You can find more packages employing the
same technique at
http://lintian.debian.org/tags/package-contains-hardlink.html.

But we should not only look at packages doing this, but packages that
are wasting precious mirror and disk space[1]:

binary package                  #files  #bytes

wims-extra-all                  11057   44415092
mixxx-data                      7302    8055125
widelands-data                  6692    12953306
code-aster-test                 3225    59938595
sofia-sip-doc                   3146    6848743
mailman                         1745    2007439
texlive-lang-cjk                1619    4986872
spikeproxy                      1602    5934959
acl2-doc                        1598    7209512
freefoam-dev-doc                1495    3145120
wims                            1458    2125970
triplea                         1340    8641063
libqt4-dev                      1337    5003042
libboost1.54-doc                1240    4131392
libgrib-api-1.10.4              1210    1678922
lazarus-doc-1.0.10              1174    10734571
python-matplotlib-doc           1172    24691971
fonts-mathjax-extras            1136    141683
libboost1.53-doc                1097    3717938
dotlrn                          1096    5046637
libboost1.49-doc                1091    3578000
gnat-4.4                        1083    10643007
openclipart2-libreoffice        1046    2142208
sql-ledger                      1041    9248930
esys-particle                   1025    8243181
typo3-src-4.5                   1019    1528729
texlive-fonts-extra             998     4687576
moodle                          959     6392249
openbox-themes                  926     200312
xfwm4-themes                    890     412192
grass-dev-doc                   832     1124116
phpbb3-l10n                     825     623634
fillets-ng-data                 818     2712929
tuxpaint-stamps-default         813     2824876
optgeo                          793     2681882
libbcel-java-doc                760     17640174
publican                        750     5283082
msp430mcu                       737     14475576
freegish-data                   691     1252457
collabtive                      687     1419645
fp-docs-2.6.2                   683     2111629
libmapi-dev                     681     31188
libnb-platform13-java-doc       678     1349378
murrine-themes                  656     255650
ctpp2-doc                       642     699880
fvwm-crystal                    634     800295
pacemaker-dev                   628     1399352
libknopflerfish-osgi-java-doc   598     4134711
libreoffice-dmaths              588     905010
freefoam-user-doc               587     883850

The numbers above are the achievable savings by using links. A few of
those files will not be hard linkable for crossing popular file system
boundaries. Still the projected savings are significant. Clearly, a
generic solution is desirable. If you are interested in details on the
savings of a particular package, visit
http://dedup.debian.net/compare/<package>/<package>. Roughly every 25th
file in the archive is duplicated within the same package. That's almost
1% of the uncompressed archive size.

> Looking at publican a number of questions occur to me
> 
>  * why hardlink all of the contents of
>    /usr/share/doc/publican/Users_Guide/desktop/$LOCALE/Common_Content
>    together rather than symlink them to some common directory like
>    /usr/share/publican/Common_Content? Is it because there might be
>    additions or omissions across locales?

Because it is more work to do so. One of the big advantages of using
hard links is that you don't have to choose a "primary location". These
hard links are generated at package build time.

>    * Can/should that not be handled within the tool itself (implement
>      a multi-directory lookup process)

Again this is more work. It might be possible in the case of publican,
but if you look at the list above, you'll quickly notice that this
approach doesn't scale.

Is there any technical reason for rejecting the usage of hard links in
binary packages besides common file system boundaries?

In any case clarifying and documenting whether cross-directory hard
links are a tool to be used seems worthwhile to me.
 * Either they are to be avoided at all costs, then we have a hand full
   of violations to be fixed,
 * or they are a tool that can be used to significantly shrink mirror
   and installation size at very little effort.

Helmut

[1] ssh delfin.debian.org sqlite3 /srv/dedup.debian.org/dedup.sqlite3
    '"SELECT package.name, sharing.files, sharing.size FROM package JOIN
    sharing JOIN function WHERE sharing.pid1 = package.id AND
    sharing.pid2 = package.id AND sharing.fid1 = function.id AND
    sharing.fid2 = function.id AND function.name = \"sha512\" ORDER BY
    sharing.files DESC LIMIT 50;"'


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20131115204700.ga10...@alf.mars

Reply via email to