> On 14 Oct 2024, at 03:49, Cy Schubert <cy.schub...@cschubert.com> wrote:
>
>> It
>> can be solved, I think the DirectX LLVM backend ("DXIL") does this, but I
>> still suggest you not do this.
NaCl and SPIR made this mistake first. WebAssembly and SPIR-V learned the
lesson.
>> LLVM is huge. Really huge. A codebase that large has no business being in
>> the kernel.
Many years ago, I wrote a proof of concept BPF to LLVM IR compiler. The idea
was that a trusted userspace component could do the BPF compilation and load
binary code into the kernel. BPF would still be BPF and so have the same
guarantees, but compiling it would be faster (on average, each BPF bytecode was
slightly more than one x86 instruction after LLVM optimisations had run). LLVM
was still in the TCB though, even in userspace. I didn’t peruse it because LLVM
is *not* safe in the presence of untrusted inputs.
More generally, the LLVM IR model is similar to C. It allows arbitrary pointer
casts and arbitrary pointer arithmetic. It is not a good starting point for
anything that you want to analyse for security. LLVM analyses take advantage of
undefined behaviour. An in-bounds address calculation instruction is an
assertion from the front end that the result will be in bounds. Optimisations
are free to rely on this, even when they can’t prove it, because it is
undefined behaviour to claim something is in bounds when it is not. The same is
true of a lot of other properties on the IR. Many are not computable to recover
post facto, they rely on translation from a higher-level language that enforces
the properties by construction.
David