On Mon, Apr 1, 2024 at 2:26 PM Zack Weinberg <z...@owlfolio.org> wrote: > > On Mon, Apr 1, 2024, at 2:04 PM, Russ Allbery wrote: > > "Zack Weinberg" <z...@owlfolio.org> writes: > >> It might indeed be worth thinking about ways to minimize the > >> difference between the tarball "make dist" produces and the tarball > >> "git archive" produces, starting from the same clean git checkout, > >> and also ways to identify and audit those differences. > > > > There is extensive ongoing discussion of this on debian-devel. There's > > no real consensus in that discussion, but I think one useful principle > > that's emerged that doesn't disrupt the world *too* much is that the > > release tarball should differ from the Git tag only in the form of > > added files. Any files that are present in both Git and in the release > > tarball should be byte-for-byte identical. > > That dovetails nicely with something I was thinking about myself. > Obviously the result of "make dist" should be reproducible except for > signatures; to the extent it isn't already, those are bugs in automake. > But also, what if "make dist" produced *two* disjoint tarballs? One of > which is guaranteed to be byte-for-byte identical to an archive of the > VCS at the release tag (in some clearly documented fashion; AIUI, "git > archive" does *not* do what we want).
Thinking about how to implement this: so, currently automake variables have (at least) 2 special prefixes (that I can think of at the moment) that control various automake behaviors: "dist" or "nodist" to control inclusion in the distribution, and "noinst" to prevent installation. What about a 3rd one of these prefixes: "novcs", to teach automake about which files belong in VCS or not? i.e. then you might have a variable name like: dist_novcs_DATA = foo bar baz ...which would indicate that foo, bar, and baz are data files that ought to be distributed in the release tarball, but not in the VCS-based one? Or would it be easier to just teach automake to read .gitignore files and the like so that it can get that information from there? > The other contains all the files that "autoreconf -i" or "./bootstrap.sh" > or whatever would create, but nothing else. Diffs could be provided > for both tarballs, or only for the VCS-archive tarball, whichever turns > out to be more compact (I can imagine the diff for the generated-files > tarball turning out to be comparable in size to the generated-files > tarball itself). > > This should make it much easier to find, and therefore audit, the pre- > generated files, and to validate that there's no overlap. It would add > an extra step for people who want to build from tarball, without having > to install autoconf (or whatever) first -- but an easier extra step > than, y'know, installing autoconf. :) Conversely, people who want to > build from tarballs but *not* use the pre-generated configure, etc, > could now download the 'bare' tarball only. > > ("Couldn't those people just build from a git checkout?" Not if they > don't have the tooling for it, not during early stages of a distribution > bootstrap, etc. Also, the act of publishing a tarball that's a golden > copy of the VCS at the release tag is valuable for archival purposes.) > Agreed on these points. > zw