I generally agree or sympathize with what you say, so below I only
comment on things I find interesting to discuss further -- hoping that
this isn't interepreted as unintentionally confrontative but instead
more playfully curious.

Vagrant Cascadian <vagr...@reproducible-builds.org> writes:

>> If there is a reproducability difference between these two approaches,
>> isn't that something that should be fixed?
>
> If we rebuild with a different toolchain, I expect to get different
> results.
>
> You might get lucky and get reproducibility with small variations in the
> toolchain, but in general, it seems unreasonable to expect with
> different inputs you get the same outputs.

I think that is an over-generalization of what I expect -- I only expect
that using the _latest_ available input (i.e., latest version of
build-depends) should lead to the same output as the one we publish.
Not all different inputs, which I agree is unreasonable and impossible.

The expectation that using the latest available build dependency leads
to the same binary we publish seems reasonable to me, and something
worthwhile to work on.

Giving up on this goal appears like giving up on the second stage of
building gcc with itself and comparing the output to the previous build.
It makes things much easier, but so does giving up all hard problems.

> Something that would make that much easier for any given release is if
> the entire release was rebuilt at least once (ideally several times) in
> the development cycle with most packages built against a narrow set of
> the toolchain (e.g. rebuild all of the build essential set, and then
> rebuild everything from there on out). That would help reduce the
> numerous permutations of a given compiler down to a smaller set of
> versions, at least.

Yes!  Doing that may be a corollary from my expectation.  I think the
most practical way to reach system-wide idempotent rebuilds is to do
iterative rebuilding using the latest available build inputs, and trying
to fix whatever differences there are, until you get identical rebuilt
packages, and at that point you publish that system-wide set of binary
package.

What I fear here is the work involved to resolve recursive cycles in the
rebuild graph.  But from an attacker point of view, that's exactly where
you want to put your malicious code.  Attackers are likely to find these
weak spots, if they haven't already since nobody is looking for them.

>> One goal with all this is that we can identify what source code was
>> used to build the software we use, and be able to audit that.  It is
>> less work to audit all of trixie+X source code than to audit all of
>> trixie+X PLUS all required build-dependencies going back to the
>> beginning of time, which may include no longer available packages (for
>> legal or technical reasons).
>
> Agreed that this is unfortunate... though practically speaking, I fear
> this may be the necessity.

Yes, I also fear that, and this may be the practical outcome of a
idempotent Debian rebuild project.  Having the details of why this is
the case -- for example, everything in trixie+X eventually build-depend
on package Y from 2002 that is no longer legal to redistribute -- would
be an improvement to the current state of just guessing/fearing that
this is the case.  I have hope that if a particular package like that
can be identified, there will be interest in re-implementing the
properties of that package to fix things.

Another feature idempotent rebuilds may help with is to re-bootstrap the
entire Debian trixie+X release from another operating system like Guix
or macOS.  Rebuilding all packages in trixie+X from Guix directly is
simpler than rebuilding all reverse build dependencies of trixie+X and
then building trixie+X using those reverse build dependencies.

/Simon

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Reproducible-builds mailing list
Reproducible-builds@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds

Reply via email to