> AIUI compilers have been studied so extensively that their production is > largely automated.
Oh, no. There are some parts we know how to automate, but by and large it's all hand written code. > Create an EBNF specification, feed it through a tool > chain (lex, yacc, cc, as, ld, etc.), and you end up with a compiler. The EBNF specification only gives you the syntax of the source language. It's barely sufficient for a pretty printer, but lacks all the information about typing rules and dynamic semantics of the source language, as well as any information about the syntax and semantics of the target language, and doesn't say anything about to optimize the code either. The part you can automate with lex/yacc and friends is a tiny fraction of a compiler, except for very naive toy compilers. > The process is known and the results are predictable; especially with > standards-based languages such as C. So, a skilled attacker will know > what you're doing, how you are doing it, and may be able to produce a > 'cA' that infects both 'A' and 'T'. That is a risk, indeed. > If you are going to produce source code for a trusted compiler 'T', then you > should also produce an executable 'cT'. That could be significantly harder. > AIUI this can be done by writing a simplified compiler in some other > language 'a', Indeed, actually your trusted compiler `T` doesn't need to be compiled with `cA` (nor written in the same source language), it just needs to be used somehow to compile `A` to `cA2` so it can be compared to `cA` to see if there's a backdoor. Stefan