On 7/2/19 8:26 AM, Adrian Bunk wrote:
On Mon, Jul 01, 2019 at 10:58:04AM -0500, Joshua Watt wrote:
...
1. HOSTTOOLS differences. There are a lot of tools listed in HOSTTOOLS, and
unfortunately some of them have version dependent output and are used for
target builds (the one I've currently stumbled upon is pod2man, but I'm sure
there are others). Unfortunately, one could probably argue that HOSTTOOLS is
somewhat antithetical to the above statement, at least in regard to target
builds. Any host tool output that "leaks" into the target build output can
result in a non-reproducible build across hosts, and possibly should be
avoided; the alternative is to use (or mandate) the corresponding -native
recipe that provides that tool as a DEPENDS so that the controlled
internally built version is used instead. Note that this only really applies
target builds, not -native (or nativesdk right now). -native recipes would
obviously need more HOSTTOOLS to help bootstrap the system. I suspect this
would require reworking how HOSTOOLS works so that they can be split into
two categories somehow; the tools that have "ubiquitous and stable"
interfaces and are fine for all recipes (e.g. cat, sed, true, rm, etc.) and
those that are variable and should only be used for -native builds (e.g.
pod2man, rpcgen(?), chrpath(?), tar(?)... others?). Anyone have thoughts on
this?
...
What is the goal?
1. being able to prove that a given binary has actually been
built from the correct sources, or
2. builds on all hosts have the same output
I'm not sure there is just one goal...
With 1. you can just record all host properties like installed packages
and running kernel, and it isn't a problem if different hosts result in
different output.
Right... I know that my employer would really like this sort of binary
reproducibility; that is we should be able to pull some archived code
out of our salt mine, build it, and know its the same binary that our
customers have. I think if you combine what we have today and some sort
of reproducible host image (archived Docker container, virtual machine,
et al.) we are pretty close to that
With 2. any kind of differences due to host differences is a problem.
You need -native for nearly everything, and then fix all other kinds of
differences like the version of the running kernel recorded somewhere.
Yes. I would hope that after using mostly -native tools where
applicable, the currently running kernel wouldn't figure into the build
of target packages... if it does I would venture to say that is a
cross-compiling/reproducibility bug in the package.
Also, to be clear, I'm hoping we don't need to go so far as to say that
-native recipes need to necessarily be reproducible; as long as they
always generate reproducible output regardless of which host they were
built on I suspect they don't need to be.
For detecting malicous binaries not built from the claimed sources 1. is
sufficient. For distributions like Debian that build natively this is
even the only option available since the host compiler is used.
Doing 2. would of course be more desirable, but it can also be done in
a second step after all issues related to building on exactly the same
host have been sorted out.
I think there are also other use cases for #2 besides detecting
malicious binaries/source code, such as hash equivalence, or even being
able use sstate when making a reproducible build. You are correct that
this can be done in a second step, but I think that everyone needs to be
aware of the limitations that will present when #2 is not present (the
main one being that you probably can't make a reproducible build if you
use sstate).
Joshua Watt
...
cu
Adrian
--
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core