On 3/4/21 6:50 PM, Stefan Monnier wrote:
The abstract states:
"In the DDC technique, source code is compiled twice: once with a
second (trusted) compiler (using the source code of the compiler’s
parent), and then the compiler source code is compiled using the
result of the first compilation. If the result is bit-for-bit
identical with the untrusted executable, then the source code
accurately represents the executable."
I find the above to be unclear:
Of course, it's an abstract. You can read the paper for more
details. Basically:
Take an existing untrusted compiler whose source code is A and binary
is cA. You check that:
cA == cA(A)
if it's not the case (or if you don't have access to the source code
A), the DDC technique can't be used. If it is the case, you have
just checked that `A` is indeed the source code for `cA`.
You have checked that 'cA' can reproduce itself given 'A' as input.
This is the key feature of the TT attack -- the machine code executable
'cA' contains a backdoor, but the source code 'A' does not.
The backdoor source code was removed from 'A' once the backdoor machine
code injection was working in 'cA'. The backdoor machine code injection
in 'cA' is what perpetuates the TT attack whenever 'cA' is recompiled
from 'A'.
Then take a trusted compiler whose source code is T. Now compile it
with `cA`:
cT = cA(T)
and then use this new compiler binary `cT` to compile `A` a second
time:
cA2 = cT(A)
finally compare `cA` and `cA2`. If they're bit-for-bit identical,
then you're good: `cA` doesn't seem to have any hidden trojan horse.
You need to not only trust that [T] does what you think it does, but
also that any attacker who may have infected `cA` hasn't seen [T]
and can't have guessed enough of its content to be able to properly
infect `cT`.
Thanks for the explanation.
AIUI compilers have been studied so extensively that their production is
largely automated. Create an EBNF specification, feed it through a tool
chain (lex, yacc, cc, as, ld, etc.), and you end up with a compiler.
The process is known and the results are predictable; especially with
standards-based languages such as C. So, a skilled attacker will know
what you're doing, how you are doing it, and may be able to produce a
'cA' that infects both 'A' and 'T'.
If you are going to produce source code for a trusted compiler 'T', then
you should also produce an executable 'cT'. AIUI this can be done by
writing a simplified compiler in some other language 'a', using that to
create an intermediate compiler 'aT = a(T)', and using that to produce
the compiler 'cT = aT(T)'. (One more step of 'cT = cT(a)' may be
required to obtain a fixed point binary.) Now you can compare 'cA' to
'cT(A)' and see the back door directly.
David