Re: [OE-core] The state of reproducible Builds

Joshua Watt Tue, 02 Jul 2019 07:19:00 -0700


On 7/2/19 8:26 AM, Adrian Bunk wrote:

On Mon, Jul 01, 2019 at 10:58:04AM -0500, Joshua Watt wrote:

...
1. HOSTTOOLS differences. There are a lot of tools listed in HOSTTOOLS, and
unfortunately some of them have version dependent output and are used for
target builds (the one I've currently stumbled upon is pod2man, but I'm sure
there are others). Unfortunately, one could probably argue that HOSTTOOLS is
somewhat antithetical to the above statement, at least in regard to target
builds. Any host tool output that "leaks" into the target build output can
result in a non-reproducible build across hosts, and possibly should be
avoided; the alternative is to use (or mandate) the corresponding -native
recipe that provides that tool as a DEPENDS so that the controlled
internally built version is used instead. Note that this only really applies
target builds, not -native (or nativesdk right now). -native recipes would
obviously need more HOSTTOOLS to help bootstrap the system. I suspect this
would require reworking how HOSTOOLS works so that they can be split into
two categories somehow; the tools that have "ubiquitous and stable"
interfaces and are fine for all recipes (e.g. cat, sed, true, rm, etc.) and
those that are variable and should only be used for -native builds (e.g.
pod2man, rpcgen(?), chrpath(?), tar(?)... others?). Anyone have thoughts on
this?
...

What is the goal?

1. being able to prove that a given binary has actually been
    built from the correct sources, or
2. builds on all hosts have the same output

I'm not sure there is just one goal...

With 1. you can just record all host properties like installed packages
and running kernel, and it isn't a problem if different hosts result in
different output.

Right... I know that my employer would really like this sort of binaryreproducibility; that is we should be able to pull some archived codeout of our salt mine, build it, and know its the same binary that ourcustomers have. I think if you combine what we have today and some sortof reproducible host image (archived Docker container, virtual machine,et al.) we are pretty close to that


With 2. any kind of differences due to host differences is a problem.
You need -native for nearly everything, and then fix all other kinds of
differences like the version of the running kernel recorded somewhere.

Yes. I would hope that after using mostly -native tools whereapplicable, the currently running kernel wouldn't figure into the buildof target packages... if it does I would venture to say that is across-compiling/reproducibility bug in the package.

Also, to be clear, I'm hoping we don't need to go so far as to say that-native recipes need to necessarily be reproducible; as long as theyalways generate reproducible output regardless of which host they werebuilt on I suspect they don't need to be.


For detecting malicous binaries not built from the claimed sources 1. is
sufficient. For distributions like Debian that build natively this is
even the only option available since the host compiler is used.

Doing 2. would of course be more desirable, but it can also be done in
a second step after all issues related to building on exactly the same
host have been sorted out.

I think there are also other use cases for #2 besides detectingmalicious binaries/source code, such as hash equivalence, or even beingable use sstate when making a reproducible build. You are correct thatthis can be done in a second step, but I think that everyone needs to beaware of the limitations that will present when #2 is not present (themain one being that you probably can't make a reproducible build if youuse sstate).

Joshua Watt
...

cu
Adrian

--
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core

Re: [OE-core] The state of reproducible Builds

Reply via email to