[OE-core] The state of reproducible Builds

Joshua Watt Mon, 01 Jul 2019 08:58:37 -0700

All,

I've been working on making OE builds reproducible (that is, two givenbuilds can have binary-identical outputs). The current "test" forreproducibility involves building core-image-minimal in two differentbuild directories, then doing a binary diff of the resulting targetDebian packages files and reporting if any of them differ (I'd like toexpand this test, see below). I believe that we are very close toachieving this level of reproducibility, with a few caveats as shown below:


1. Both builds must be clean builds from scratch

2. Neither build can use sstate (sstate isn't currently reproducible fora variety of reasons, more on that later)

3. The QA test for reproducibility takes about 4 hours on my 4/8 corei7-3770 CPU @ 3.40GHz. I'm not sure how "expensive" a test has to bebefore it can't reasonably be run on the autobuilders, but I'm guessingthis isn't a QA test that would currently be able to be run very often(if at all). If sstate were reproducible, this would effectively be cutin half, since you would only need one clean build from scratch (if thatwould even matter).

The current test is obviously deficient in a few areas, but I believethat is at the very least a good starting point since it has alreadyuncovered numerous reproducibility issues. The places where I think itneeds to be improved are:

1. Testing RPM and IPK package formats. I think RPMs will be prettyeasy; IPKs might be more challenging since AFAIK the tools that makethem don't generate reproducible output to begin with.

2. Testing more images than core-image-minimal; This should be prettystraight forward to add to the QA test, it's mostly a matter of fixingall the issues that come up.

3. Test for binary reproducible images (e.g. check that the entire ext4image produced is binary identical). This one also might be pretty easyfor some formats, and hard for others (e.g. ext4 I think would be easy,squashfs might be hard).

4. Improve the test to better test timestamp changes. Currently, the QAtest runs the two test builds serially which ensures that they have adifferent datestamp when building. However, there are some packages thatare not reproducible based on only the Day, Month, or Year, neither ofwhich is likely to be different between the two serial test builds. Iwould like to figure out a way to force one of the builds to beseparated by a sufficient about of time to tease out these issues. Thismight be as easy as running bitbake under faketime, or it might be moreinvolved.


5. I don't know if anyone is clamoring for reproducible nativesdk builds?

6. We should also be testing if sstate objects are reproducible,otherwise sstate can't really be relied on when doing a reproduciblebuild (In fact, I think the original reproducible build work that I tookover was focused on making sstate reproducible).

I think that OE has some significant advantages in being able to makereproducible builds compared to other projects attempting the samething; primarily, we are capable of building up all (or most) of therequired build tools internally, then using these internal tools tobuild up the target (e.g. we build GCC for the target, then use thisbuilt GCC to compile target source). This means that we have a greatopportunity to isolate the build from the host environment and trulyachieve "simple" reproducible builds; any given set of layers at theirrespective SHA's should be able to build a binary identical output onany given host, with (ideally) no dependency on the host. We can't dothis today, and I've identified a number of roadblocks that will need tobe resolved (this is not a complete list; there will be more):

1. HOSTTOOLS differences. There are a lot of tools listed in HOSTTOOLS,and unfortunately some of them have version dependent output and areused for target builds (the one I've currently stumbled upon is pod2man,but I'm sure there are others). Unfortunately, one could probably arguethat HOSTTOOLS is somewhat antithetical to the above statement, at leastin regard to target builds. Any host tool output that "leaks" into thetarget build output can result in a non-reproducible build across hosts,and possibly should be avoided; the alternative is to use (or mandate)the corresponding -native recipe that provides that tool as a DEPENDS sothat the controlled internally built version is used instead. Note thatthis only really applies target builds, not -native (or nativesdk rightnow). -native recipes would obviously need more HOSTTOOLS to helpbootstrap the system. I suspect this would require reworking howHOSTOOLS works so that they can be split into two categories somehow;the tools that have "ubiquitous and stable" interfaces and are fine forall recipes (e.g. cat, sed, true, rm, etc.) and those that are variableand should only be used for -native builds (e.g. pod2man, rpcgen(?),chrpath(?), tar(?)... others?). Anyone have thoughts on this?

2. sstate currently isn't reproducible. This is at least partiallyrelated to the why non-clean rebuilds aren't reproducible[1]. These twoare related because AFAIK there isn't really anyway of knowing if ansstate object came from a clean build of a recipe or a rebuild of arecipe, so as long as rebuilds aren't reproducible, neither will sstatebe reproducible. The simplest fixes for these problems is to add more-native tools to DEPENDS if they are used by the builds so that are"stable" across all the tasks where it matters, but there might also besome more "tricky" things that can/should be done with RSS to helpmitigate the problem. The HOSTTOOLS issue also makes sstatenon-reproducible, since AFAIK, there isn't necessarily a way to ensurethat a sstate object came from a specific host. In fact, I wouldspeculate that most core reproducibility issues will also make sstatenon-reproducible. Reproducible sstate also plays directly into hashequivalence, since it is based on sstate and would be *much* moreeffective if sstate were reproducible.

Many of the remaining problems can be solved by adding more -nativerecipes to DEPENDS, but this has meet with some (justified) push back;doing this things will likely increase the build time since more -nativedependencies will mean more -native tools have to be built, and moreserialization of the builds waiting for those tools to be built. Isuspect this is more true for replacing HOSTTOOLS with -native recipes,since many of them may not have needed to be built at all. For thesstate/rebuild reproducibility this is likely to have less impact sincethose recipes were going to eventually have been built to be included inRSS. Adding them to DEPENDS just moves them to be included sooner.

I'm curious what people thing about all this; How important isreproducibility? How reproducible do we want to be? How hard should itbe to have reproducible builds? What trade-offs are willing to be madefor reproducible builds? Are there smart ways we can mitigate some ofthe potential performance impacts of reproducible builds?



Thanks for your time. I know this was a long e-mail.

Joshua Watt

[1]: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13378



--
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core

[OE-core] The state of reproducible Builds

Reply via email to