I generally agree or sympathize with what you say, so below I only comment on things I find interesting to discuss further -- hoping that this isn't interepreted as unintentionally confrontative but instead more playfully curious.
Vagrant Cascadian <vagr...@reproducible-builds.org> writes: >> If there is a reproducability difference between these two approaches, >> isn't that something that should be fixed? > > If we rebuild with a different toolchain, I expect to get different > results. > > You might get lucky and get reproducibility with small variations in the > toolchain, but in general, it seems unreasonable to expect with > different inputs you get the same outputs. I think that is an over-generalization of what I expect -- I only expect that using the _latest_ available input (i.e., latest version of build-depends) should lead to the same output as the one we publish. Not all different inputs, which I agree is unreasonable and impossible. The expectation that using the latest available build dependency leads to the same binary we publish seems reasonable to me, and something worthwhile to work on. Giving up on this goal appears like giving up on the second stage of building gcc with itself and comparing the output to the previous build. It makes things much easier, but so does giving up all hard problems. > Something that would make that much easier for any given release is if > the entire release was rebuilt at least once (ideally several times) in > the development cycle with most packages built against a narrow set of > the toolchain (e.g. rebuild all of the build essential set, and then > rebuild everything from there on out). That would help reduce the > numerous permutations of a given compiler down to a smaller set of > versions, at least. Yes! Doing that may be a corollary from my expectation. I think the most practical way to reach system-wide idempotent rebuilds is to do iterative rebuilding using the latest available build inputs, and trying to fix whatever differences there are, until you get identical rebuilt packages, and at that point you publish that system-wide set of binary package. What I fear here is the work involved to resolve recursive cycles in the rebuild graph. But from an attacker point of view, that's exactly where you want to put your malicious code. Attackers are likely to find these weak spots, if they haven't already since nobody is looking for them. >> One goal with all this is that we can identify what source code was >> used to build the software we use, and be able to audit that. It is >> less work to audit all of trixie+X source code than to audit all of >> trixie+X PLUS all required build-dependencies going back to the >> beginning of time, which may include no longer available packages (for >> legal or technical reasons). > > Agreed that this is unfortunate... though practically speaking, I fear > this may be the necessity. Yes, I also fear that, and this may be the practical outcome of a idempotent Debian rebuild project. Having the details of why this is the case -- for example, everything in trixie+X eventually build-depend on package Y from 2002 that is no longer legal to redistribute -- would be an improvement to the current state of just guessing/fearing that this is the case. I have hope that if a particular package like that can be identified, there will be interest in re-implementing the properties of that package to fix things. Another feature idempotent rebuilds may help with is to re-bootstrap the entire Debian trixie+X release from another operating system like Guix or macOS. Rebuilding all packages in trixie+X from Guix directly is simpler than rebuilding all reverse build dependencies of trixie+X and then building trixie+X using those reverse build dependencies. /Simon
signature.asc
Description: PGP signature
_______________________________________________ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds