Thomas Goirand wrote: > But good luck to teach good practices upstream. See Ross's reply: 120 > packages are depending on this.
It's more than that. Given tooling that doesn't have excessive overhead for small packages, why call such packages "bad practices" in the first place? > Though it is also my view that packaging tiny stuff shouldn't be a > problem. If it is, then we should fix whatever it is that is problematic > in Debian infra. Agreed. Let's consider what overhead exists for a Debian package, and what we could potentially reduce or remove, using node-defined as an example. (Obviously any such changes to metadata may require a full Debian release to propagate changes to tools like apt and dpkg.) To make redundancy more evident, I'll include everything first before discussing any of it. First, an entry in Sources that looks like this, for each Debian suite (unstable/testing/stable/oldstable): Package: node-defined Binary: node-defined Version: 1.0.0-1 Maintainer: Debian Javascript Maintainers <pkg-javascript-de...@lists.alioth.debian.org> Uploaders: Ross Gammon <rossgam...@mail.dk> Build-Depends: debhelper (>= 9), dh-buildinfo, nodejs Architecture: all Standards-Version: 3.9.6 Format: 3.0 (quilt) Files: 43ab019e6b53b9f4d4ff338027cb351d 1997 node-defined_1.0.0-1.dsc 978d30ee28482aa7812f74f812b1899f 2334 node-defined_1.0.0.orig.tar.gz 557f4bcec8a449608e50d09ba69bd224 2416 node-defined_1.0.0-1.debian.tar.xz Vcs-Browser: https://anonscm.debian.org/cgit/pkg-javascript/node-defined.git Vcs-Git: git://anonscm.debian.org/pkg-javascript/node-defined.git Checksums-Sha1: 02cb2027e3218b93fd856a5e3b68134fe01e47c1 1997 node-defined_1.0.0-1.dsc eff888bf76f9cfcca2b94e39c470a6c1441b3f03 2334 node-defined_1.0.0.orig.tar.gz 7237a9a8aee2add44a9d8bb0dae382c3f0a923cf 2416 node-defined_1.0.0-1.debian.tar.xz Checksums-Sha256: 4aa2a079bc7119678c58643def268e4789b56a6a40b2931601de527244a1def8 1997 node-defined_1.0.0-1.dsc d953e6e9fe9277cc6e68e5bb36a299d8f3505f8facd3468ab7edc7d6858d293a 2334 node-defined_1.0.0.orig.tar.gz 56ede623ee7929fcb334fa7459c3e3f43b529bf2b585866d5ebc9ee06cc3d03d 2416 node-defined_1.0.0-1.debian.tar.xz Homepage: https://github.com/substack/defined Package-List: node-defined deb web optional arch=all Testsuite: autopkgtest Directory: pool/main/n/node-defined Priority: extra Section: misc Second, an entry in *each architecture's* Packages file like this, for each Debian suite: Package: node-defined Version: 1.0.0-1 Installed-Size: 19 Maintainer: Debian Javascript Maintainers <pkg-javascript-de...@lists.alioth.debian.org> Architecture: all Depends: nodejs Description: return the first argument that is `!== undefined` Homepage: https://github.com/substack/defined Description-md5: b4200f8f2e989c1354c3c1cb3677e663 Section: web Priority: optional Filename: pool/main/n/node-defined/node-defined_1.0.0-1_all.deb Size: 3292 MD5sum: d5a08f2219b4128a49be206caeb5b8b4 SHA1: 115317d45d5028203269d84aa07c447d7c12ea7b SHA256: 5be875d209afc69aa2d6be10bbed3c514e75f0a5e8d5a769a6461f42ab6db581 (Note that a source package with multiple binary packages would have multiple such entries.) Third, an entry in Translation-en (and every other translation), for each Debian suite: Package: node-defined Description-md5: b4200f8f2e989c1354c3c1cb3677e663 Description-en: return the first argument that is `!== undefined` Most of the time when you chain together ||s, you actually just want the first item that is not undefined, not the first non-falsy item. . This module is like the defined-or (//) operator in perl 5.10+. . Node.js is an event-based server-side JavaScript engine. Fourth, the source package .dsc file: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Format: 3.0 (quilt) Source: node-defined Binary: node-defined Architecture: all Version: 1.0.0-1 Maintainer: Debian Javascript Maintainers <pkg-javascript-de...@lists.alioth.debian.org> Uploaders: Ross Gammon <rossgam...@mail.dk> Homepage: https://github.com/substack/defined Standards-Version: 3.9.6 Vcs-Browser: https://anonscm.debian.org/cgit/pkg-javascript/node-defined.git Vcs-Git: git://anonscm.debian.org/pkg-javascript/node-defined.git Testsuite: autopkgtest Build-Depends: debhelper (>= 9), dh-buildinfo, nodejs Package-List: node-defined deb web optional arch=all Checksums-Sha1: eff888bf76f9cfcca2b94e39c470a6c1441b3f03 2334 node-defined_1.0.0.orig.tar.gz 7237a9a8aee2add44a9d8bb0dae382c3f0a923cf 2416 node-defined_1.0.0-1.debian.tar.xz Checksums-Sha256: d953e6e9fe9277cc6e68e5bb36a299d8f3505f8facd3468ab7edc7d6858d293a 2334 node-defined_1.0.0.orig.tar.gz 56ede623ee7929fcb334fa7459c3e3f43b529bf2b585866d5ebc9ee06cc3d03d 2416 node-defined_1.0.0-1.debian.tar.xz Files: 978d30ee28482aa7812f74f812b1899f 2334 node-defined_1.0.0.orig.tar.gz 557f4bcec8a449608e50d09ba69bd224 2416 node-defined_1.0.0-1.debian.tar.xz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJWKj8IAAoJEPNPCXROn13ZrhwP/1+FQtC5NIM1SAWj8capx3Sm rdLtO29o+M7mSaiN7c10IYn+OXFu+AMFikVnD4+6Jzj3qtWfk6sgRWBsU2IXQ9Br xUj8pskB5t2Ti8aAzoId3wKxgOL9JF9u6b7MzkER1WOXOOMjmT16OASRjx1vmJSh OrDHJKJN2n8KIoJerWQ3d9GazCFgQZ3HfgDXWUeupkWG8emGoyvScpscsab1Mdq9 BvA5X5k4XCGalIeEXAbrx4wR6dHLfldEY/K0g3RyLmicPZbcHeMeaEBOSRkIsr7W yjzcdz7T2TAdbxG1ZOzumDcpEUKEZDSKFZwysaccyGpPsts9ZGYU5HeM5MdwJzKc 6fobHtC+ARzgcp8Fxq1xitO/zfQnJA5eUbWMykjLsf8LeZI0/g1VFVnxv2cfSHNP dh/OrGNtPAWaJgwsb/LwR2d+WinAYMocTO6n9D3ONyV6OrvVi81fRWcp24Mo4rDH 0oDG4vaZyeyKyDenHJzFCm2AlZ7pnosFx96aIHOmEeMwE0/xMedjzaE1sbWd4/Ma rf6xOVI+Tqj+YYMLLC6+dP6gNzx3qTTBBOVijotllxNGzjKUgOR0jP0RSsNyXAW/ QPYz5aftp0icn5nEEeXfjfrcclOtrAAGH7wiMKiNT99YgI/zJwHetBuESVBJ3OOT XZmmN/c/EAnp8AWFYJuy =U3sK -----END PGP SIGNATURE----- We can skip the .orig.tar.gz; that's the package itself, not overhead. Fifth, the contents of the .debian.tar.xz: debian/ debian/tests/ debian/tests/require debian/tests/control debian/docs debian/upstream/ debian/upstream/metadata debian/watch debian/copyright debian/examples debian/changelog debian/control debian/compat debian/rules debian/install debian/source/ debian/source/format debian/gbp.conf Of those, the files with the most significant overhead or duplication include debian/control, debian/changelog, debian/copyright, debian/tests/control (could be reduced or eliminated via conventions), debian/gbp.conf, and debian/upstream/metadata. (Some of the rest could be reduced or eliminated via conventions as well, though.) And sixth, the files in the .deb: drwxr-xr-x root/root 0 2015-10-23 06:59 ./ drwxr-xr-x root/root 0 2015-10-23 06:59 ./usr/ drwxr-xr-x root/root 0 2015-10-23 06:59 ./usr/share/ drwxr-xr-x root/root 0 2015-10-23 06:59 ./usr/share/doc/ drwxr-xr-x root/root 0 2015-10-23 06:59 ./usr/share/doc/node-defined/ -rw-r--r-- root/root 158 2015-10-21 07:27 ./usr/share/doc/node-defined/changelog.Debian.gz -rw-r--r-- root/root 1442 2015-10-21 07:27 ./usr/share/doc/node-defined/copyright drwxr-xr-x root/root 0 2015-10-23 06:59 ./usr/share/doc/node-defined/examples/ -rw-r--r-- root/root 123 2015-03-30 15:47 ./usr/share/doc/node-defined/examples/defined.js -rw-r--r-- root/root 1082 2015-03-30 15:47 ./usr/share/doc/node-defined/readme.markdown drwxr-xr-x root/root 0 2015-10-23 06:59 ./usr/lib/ drwxr-xr-x root/root 0 2015-10-23 06:59 ./usr/lib/nodejs/ drwxr-xr-x root/root 0 2015-10-23 06:59 ./usr/lib/nodejs/defined/ -rw-r--r-- root/root 1094 2015-03-30 15:47 ./usr/lib/nodejs/defined/package.json -rw-r--r-- root/root 150 2015-03-30 15:47 ./usr/lib/nodejs/defined/index.js The files in /usr/lib/nodejs are the contents of the package; they don't count. Examples and the upstream readme are at least arguably useful. However, copyright and changelog.Debian.gz are Debian overhead. In this and all following discussions, given the use of compression, we can mostly assume that field names in aggregated control files take almost no space; their existence at all does add a tiny amount of overhead, but we wouldn't save any space by reducing the lengths of field names, only by eliminating fields entirely or reducing their unique content. However, as far as I can tell, we could reduce the per-package overhead and redundancy quite a bit. A few examples: "Binary" seems a bit excessive for several reasons. First, it seems redundant with the "Source" entries in Packages files; we don't necessarily need a two-way cross-reference at all here. And second, we could assume that a missing entry means "same as Package". That rule (source equals binary) would work for 13364 of 24097 packages in Debian today, and potentially more if other single-binary packages ensured their source and binary names matched. For that matter, Binary and Package-List seem redundant. (And Package-List doesn't seem like end-user metadata; it seems like something only the Debian infrastructure needs.) Many fields, such as Maintainer and Uploaders, seem unnecessary to extract into aggregated files; their presence in *one* place should suffice. Developer tools just need these in debian/control in the source package. The Debian infrastructure does need them, but end-users mostly don't, and they especially don't need to download them as part of the aggregated package metadata. Do we really need fields like Build-Depends, Testsuite, or Standards-Version pulled out of the package itself and placed into the Sources file? Why do we need to read those without the source package? (Note that tools that form part of Debian infrastructure could work from UDD or similar; the question is why those fields are needed on an end-user system that downloaded the Sources file.) Files, Checksums-Sha1, and Checksums-Sha256 are clearly redundant; has it been long enough that we can drop the first two yet? Now that we use a secure hash, do we really need the sizes in those fields? Furthermore, we could generate the filenames from the source name and version. And finally, all but the dsc seem redundant with fields in the dsc. So we could really reduce this down to a secure checksum of the DSC. Homepage really doesn't need to live in both files. Format doesn't need pulling out; tools could just parse that from the dsc file. Directory seems entirely derivable; if we want to support a variety of repository layouts, we could put repository layout information into the Release file. Priority and Section seem not only redundant between source and binary, but actively wrong: note that they differ in this case, yet the source builds only one binary. extra/misc seems wrong; optional/web seems correct (at least until we establish a "js" section). In the Packages files for binaries, we could eliminate a *massive* amount of redundancy by having a dedicated Packages file for "all", to avoid duplicating entries into every architecture's Packages file. That should not significantly increase overhead for end-users, and for any user of multiarch it'll decrease overhead. A quick check on amd64 shows that splitting out "all" into a separate Packages file would not change the combined uncompressed size at all, should not change the pdiff size at all, and would increase the combined compressed full-download size by 94k, from 9957k to 10051k, an increase of less than 1%. That seems reasonable in exchange for eliminating 12 duplicate copies of the 4396k used for "all" Packages files, times suites (oldstable/stable/testing/unstable/experimental), and that doesn't even count unofficial architectures, or snapshot.debian.org. Ditto for translated descriptions, except that there, we should share descriptions across architectures by default, even for arch-specific packages. Almost no packages have descriptions that vary by architecture. For Packages, we have a similar waste of space storing md5 and sha1 hashes, and .deb package size. Likewise for dsc files, in addition to the mostly derivable filenames. For translated descriptions, Package and Description-md5 seem redundant. In the dsc, we have a similar redundancy between Source, Binary, and Package-List. And even if the section/priority/etc information made sense in Package-List, it gets overridden with the canonical information provided by the archive. That's not even getting into the more controversial items, like debian/changelog (a vestige of a pre-VCS era), or debian/copyright. Or more fundamental changes, like stuffing absolutely everything into a single git repository for deduplication and incremental downloads. - Josh Triplett