(delurking) Ian Grant writes:
> In case it isn't obvious, what I am interested in is how easily we can know > the problem of infeasibly large binaries isn't an instance of this one: > > http://livelogic.blogspot.com/2014/08/beware-insiduous-penetrator-my-son.html Ah, this is commonly called the Thompson hack, since Ken Thompson actually produced a successful demo: http://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html The only way that the Thompson hack can survive a three-stage bootstrap is if the compiler used for the stage 1 build has the bad code. The comparison between stages 2 and 3 require exact match, and any imperfection in the object code injection would reveal itself. So, you can build GCC with LLVM or Intel's compiler or Microsoft's or IBM's or Sun's, doing cross-compilation where necessary. The basic idea is: 1: build gcc with 3-stage bootstrap, starting with a compiler that you suspect might be infected. call the result A. 2: do it again, starting with a different compiler that you think is independent of the compiler you used in step 1. call it B. 3: compare A to B. If they differ, you've found something that should be investigated. If you don't, then either A and B are both clean, or A and B both have the identical inserted object code. Maybe they have a common ancestor? Note that if you build gcc with a cross-compiler the object code will be different. You have to use the cross-compiler to build one more time to "normalize": GCC 4.9.0 built with GCC 4.9.0 on operating system X should always be the same. As far as I know no one has been paranoid enough to put in the time to do the experiment on a large scale, and it's harder because you can't build a modern GCC (or LLVM for that matter) with an ancient compiler. But you can create a chain: grab an ancient gcc version off a 15-year-old CD, and build newer versions with it until you get up to the present. The result should be byte-for-byte identical with what you get when building the current compiler with a recent version. If it is, then either the infection is 15 years old or does not exist. Try it again by building cross-compilers from a Microsoft system. Don't trust Apple, they used to use GCC so maybe all their LLVM binaries caught the bug. BTW, if "size" is reporting much smaller size than the executable file itself and that motivates this concern, most of the difference is likely to be debug info, which is bigger since gcc switched to C++. Might want to try "strip".