On Fri, Nov 15, 2013 at 01:50:05PM +0000, Jonathan Dowland wrote: > I'm not sure that making a general rule based on an edge-case is a > good idea. Publican is not very popular at all, it's quite likely > that none of the 70 or so people who have installed it have done > anything unusual with mounts around /usr.
publican is just an example. You can find more packages employing the same technique at http://lintian.debian.org/tags/package-contains-hardlink.html. But we should not only look at packages doing this, but packages that are wasting precious mirror and disk space[1]: binary package #files #bytes wims-extra-all 11057 44415092 mixxx-data 7302 8055125 widelands-data 6692 12953306 code-aster-test 3225 59938595 sofia-sip-doc 3146 6848743 mailman 1745 2007439 texlive-lang-cjk 1619 4986872 spikeproxy 1602 5934959 acl2-doc 1598 7209512 freefoam-dev-doc 1495 3145120 wims 1458 2125970 triplea 1340 8641063 libqt4-dev 1337 5003042 libboost1.54-doc 1240 4131392 libgrib-api-1.10.4 1210 1678922 lazarus-doc-1.0.10 1174 10734571 python-matplotlib-doc 1172 24691971 fonts-mathjax-extras 1136 141683 libboost1.53-doc 1097 3717938 dotlrn 1096 5046637 libboost1.49-doc 1091 3578000 gnat-4.4 1083 10643007 openclipart2-libreoffice 1046 2142208 sql-ledger 1041 9248930 esys-particle 1025 8243181 typo3-src-4.5 1019 1528729 texlive-fonts-extra 998 4687576 moodle 959 6392249 openbox-themes 926 200312 xfwm4-themes 890 412192 grass-dev-doc 832 1124116 phpbb3-l10n 825 623634 fillets-ng-data 818 2712929 tuxpaint-stamps-default 813 2824876 optgeo 793 2681882 libbcel-java-doc 760 17640174 publican 750 5283082 msp430mcu 737 14475576 freegish-data 691 1252457 collabtive 687 1419645 fp-docs-2.6.2 683 2111629 libmapi-dev 681 31188 libnb-platform13-java-doc 678 1349378 murrine-themes 656 255650 ctpp2-doc 642 699880 fvwm-crystal 634 800295 pacemaker-dev 628 1399352 libknopflerfish-osgi-java-doc 598 4134711 libreoffice-dmaths 588 905010 freefoam-user-doc 587 883850 The numbers above are the achievable savings by using links. A few of those files will not be hard linkable for crossing popular file system boundaries. Still the projected savings are significant. Clearly, a generic solution is desirable. If you are interested in details on the savings of a particular package, visit http://dedup.debian.net/compare/<package>/<package>. Roughly every 25th file in the archive is duplicated within the same package. That's almost 1% of the uncompressed archive size. > Looking at publican a number of questions occur to me > > * why hardlink all of the contents of > /usr/share/doc/publican/Users_Guide/desktop/$LOCALE/Common_Content > together rather than symlink them to some common directory like > /usr/share/publican/Common_Content? Is it because there might be > additions or omissions across locales? Because it is more work to do so. One of the big advantages of using hard links is that you don't have to choose a "primary location". These hard links are generated at package build time. > * Can/should that not be handled within the tool itself (implement > a multi-directory lookup process) Again this is more work. It might be possible in the case of publican, but if you look at the list above, you'll quickly notice that this approach doesn't scale. Is there any technical reason for rejecting the usage of hard links in binary packages besides common file system boundaries? In any case clarifying and documenting whether cross-directory hard links are a tool to be used seems worthwhile to me. * Either they are to be avoided at all costs, then we have a hand full of violations to be fixed, * or they are a tool that can be used to significantly shrink mirror and installation size at very little effort. Helmut [1] ssh delfin.debian.org sqlite3 /srv/dedup.debian.org/dedup.sqlite3 '"SELECT package.name, sharing.files, sharing.size FROM package JOIN sharing JOIN function WHERE sharing.pid1 = package.id AND sharing.pid2 = package.id AND sharing.fid1 = function.id AND sharing.fid2 = function.id AND function.name = \"sha512\" ORDER BY sharing.files DESC LIMIT 50;"' -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131115204700.ga10...@alf.mars