[ Obviously this “summary” could be considered biased, but I do think the facts presented are accurate. ]
Hi, The two reasons for the shared / reference counted files (refcnt from now on) implementation in dpkg have been: * To avoid massive package proliferation (due to the mandated copyright and changelog files), thus the work involved in a one time split and the size increase in Packages indices. * To avoid unneeded file duplication, thus wasted space (due to those mandated files, but also partially just as a consequence of not splitting files into new arch:all packages, per above). This has the following implications: * Deploying refcnt means that M-A:same packages must always be at the same exact installed version, so that the file contents can match. ↓ More difficult upgrade paths, as this ties the different arch dependency trees around M-A:same barriers. * binNMUs need to be performed in lockstep for *all* architectures, because the installed versions need to match. ↓ Causing useless buildd usage and user downloads for arches not affected. “Fixing” this by making dpkg treat binNMU versions specially, besides being just another special case needed for M-A:same packages, would be wrong, as arch-indep content can actually change between builds, ex. generated documentation. * binNMUs for the same version might not be co-installable because doc generators, compressors, etc, might not always produce the same output. ↓ This is a pretty fragile thing to rely on. New architectures or local builds might give a hard time if generated output changed in the past. A possible fix, but only for the compressed files case might be to ship them uncompresesd, but that counters the desire to reduce wasted space. * binNMUs for the same version cannot be co-installed anyway as their changelogs differ. ↓ That could be “fixed” by using the same email address and a hardcoded date, or not including the binNMU entry at all, or moving that entry to a new field, etc. All of which seem like ugly hacks, or a possible loss of information. * It means special casing M-A:same on indentical file conflict. ↓ The same thing could be argued to be made possible for packages generated from the same source, a “problem” we've always had and managed just fine up to now with changes at the packaging level. * Once implemented, this “feature” cannot be taken out, *ever*. ↓ Because it will produce installation errors, and a long transition would not help because that would not guarantee external or old packages are fine. Conclusion ---------- The above means that binNMUs will be currently unusable for any source package building an M-A:same package, making the release team's job harder, or requiring sourceful uploads by maintainers instead. Given the numbers seen on this thread, the estimated amount of new required packages to be split off is actually pretty low (less than %2 of the current total), and new arch:all packages should be actually considered cheap as long as the payload weighs more than the metadata and the binary format itself, and they should generally actually reduce archive and multiarch DVD space usage; for Packages indices there's pdiffs which (although not currently optimally implemented) should only get downloaded on specific package updates and Descriptions are only downloaded once nowadays. And these are nothing compared to the amount of new packages pulled in per each foreign arch configured. It's been mentioned that splitting packages is a daft idea because it causes more burden to library package maintainers while dpkg could do the job once instead, but this is a progressive one time thing, while the above implications are *forever*, and if maintainers are required to do sourceful uploads instead of getting binNMUs done it actually means it's going to be even more of a burden for them. Even if no packages were to get split off and all arch-indep file paths be arch-qualified (which would actually be wrong in some cases as some of those arch-indep files should not get an arch-qualified path), and the overhead of the duplicated files was considered an issue, (although the actual libraries will usually use way more space than those few duped files), there's always --path-exclude for the ones not affecting functionality. It does not seem to make sense to consider the “huge” space usage due to not refcnt'ing an issue, when for that to happen and be significant one would need to install hundreds of M-A:same packages for multiple architectures, taking hundreds of MiB (if not GiB), at which point I'm not sure how one can make a fuss over some hundred wasted MiB, if at all. For the unreliable generated output problem, even if gzip is to be considered frozen and in maintenance mode now, that does not mean this could not change in the future. It also means we cannot safely consider switching compressors in the future, as we have cornered ourselves by the design. Switching to uncompressed files to workaround the unreliable generated output problem, still only papers over one part of the issue, and defeats the size savings in common situations (single arch installs). ---- So it really does not seem worth it, it does way more harm than good, it will generate more overall waste, make transitions and upgrades more difficult, it makes the M-A:same packages even more asymmetric and exceptional than they need to be and the size reduction arguments do not really seem to hold too much, and seems to be the actual overall more complex solution to the problem. In addition concerning the mandatory files (copyright and changelog), if we'd eventually go forward with my proposal to make them actual package metadata, then dpkg can actually manage them in its db in any way we see fit, including automatically compressing or refcnt'ing them for example when they actually match, and as such reducing installed size usage. Given all the above, I'll be pulling off for now the file refcnt and version match logic from my pu/multiarch/master branch. If some compelling arguments are brought up, something I honestly don't really see happening, then they can be actually reintroduced at any point. Proposed solution ----------------- M-A:same packages cannot have any conflicting files with their foreign counterparts. Thus: For files in M-A:same packages under a pkgname based path, the pkgname should always be arch-qualified with the Debian architecture. Most of these could be handled automatically by debhelper and cdbs, this includes things like: /usr/share/doc/pkgname/ /usr/share/bug/pkgname /usr/share/lintian/overrides/pkgname /usr/share/mime-info/pkgname.* /usr/share/menu/pkgname ... (Joey, I'm guessing you might consider it too late to do some of these in debhelper for compat level 9, right?) For toolchain related files on M-A:same packages, their path should get arch-qualified using the multiarch triplet, this includes arch-dependent headers and similar. The remaining files that are truly arch-independent, like headers, man pages, development docs, etc, should be split into arch:all package(s), to something along these lines: libfooN-doc libfooN-headers libfooN-common libfooN-common-dev libfooN-data girX.Y-foo ... Anything else remaining should be considered a bug. regards, guillem -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120210225620.ga8...@gaara.hadrons.org