On Thu, Sep 18, 2014 at 9:37 PM, Joe Buck <joe.b...@synopsys.com> wrote: > (delurking)
> Ah, this is commonly called the Thompson hack, since Ken Thompson > actually produced a successful demo: How do you know Thompson's attempt was the first instance? The document I refer to in the blog is the "Unknown Air Force Report" Thompson refers to. It was written by Roger Schell (cc'ed) > http://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html > The only way that the Thompson hack can survive a three-stage > bootstrap is if the compiler used for the stage 1 build has the bad > code. This is the overwhelmingly likely (probability 1) case. How else would the stage-2 and three compilers get the bad code? > The comparison between stages 2 and 3 require exact match, > and any imperfection in the object code injection would reveal itself. How? In the output of a utility, or a system device driver, on a system booted from a boot loader and using standard libraries such libc, all compiled by the same bug in the compilers which compiled the stage 1, 2 and 3 C compilers? > So, you can build GCC with LLVM or Intel's compiler or Microsoft's or IBM's > or Sun's, doing cross-compilation where necessary. Do these compilers all support cross-OS compilation to any OS? It sounds a bit hard to me. I just can't imagine MS, say, going to a great deal of trouble to make sure that their compiler targets Linux and OpenBSD. GCC needs quite a lot of library and OS support, doesn't it? People will have to help me a bit with this, I've not yet managed to cross-compile anything. This thread started because I was just trying to build gcc from Vanilla gcc-4.9 sources on OpenBSD, and it doesn't work. See the earlier messages. I was next going to try to build gcc-4.9 on OpenBSD, cross-targetting Linux on the same physical machine (i.e. same CPU) but I don't imagine this will be at all easy, given I can't even build the vanilla sources. People say there is chocolate source, but no-one has told me where it is yet! > The basic idea is: > > 1: build gcc with 3-stage bootstrap, starting with a compiler that you > suspect might be infected. call the result A. > 2: do it again, starting with a different compiler that you think is > independent of the compiler you used in step 1. call it B. > 3: compare A to B. If they differ, you've found something that should > be investigated. If you don't, then either A and B are both clean, or A > and B both have the identical inserted object code. Maybe they have > common ancestor? > > Note that if you build gcc with a cross-compiler the object code will be > different. > You have to use the cross-compiler to build one more time to "normalize": > GCC 4.9.0 built with GCC 4.9.0 on operating system X should always be > the same. Yes, but the problem is when the object code bug is not in the compiler binaries, it's something injected into the compiler binaries from the infected ld.so, or glibc, or the IDE disk device driver, and it infects the source to those programs. > As far as I know no one has been paranoid enough to put in the time to do > the experiment on a large scale, and it's harder because you can't build > a modern GCC (or LLVM for that matter) with an ancient compiler. But > you can create a chain: grab an ancient gcc version off a 15-year-old CD, When did you last try grabbing an ancient gcc off a 15 year old CD and getting to run on a modern OS? Was it easy? > and build newer versions with it until you get up to the present. And the rest of the chain, are they easier still? > The result should be byte-for-byte identical with what you get when > building the current compiler with a recent version. And what does that tell me, really? > If it is, then either the infection is 15 years old or does not exist. How do you figure that? > Try it again by building cross-compilers from a Microsoft system. > Don't trust Apple, they used to use GCC so maybe all their LLVM > binaries caught the bug. Interesting idea. > BTW, if "size" is reporting much smaller size than the executable > file itself and that motivates this concern, most of the difference > is likely to be debug info, which is bigger since gcc switched to > C++. Might want to try "strip". Great. As I said, the exercise we are here engaged in is to convince as many people as possible that GCC does NOT suffer from this problem on any OS, either OS, Windows, OpenBSD, FreeBSD, Solaris, or Linux on any arch., including IBM System z. So can someone tell me the quickest way to build a new set of binaries, stripped, or just how to tell whether the stage-1 binaries are in fact stripped or not? And can anyone tell me what are the 'non-vanilla' sources? Ian