> On 14 Oct 2024, at 03:49, Cy Schubert <cy.schub...@cschubert.com> wrote:
> 
>>  It
>> can be solved, I think the DirectX LLVM backend ("DXIL") does this, but I
>> still suggest you not do this.

NaCl and SPIR made this mistake first. WebAssembly and SPIR-V learned the 
lesson.

>> LLVM is huge. Really huge. A codebase that large has no business being in
>> the kernel.

Many years ago, I wrote a proof of concept BPF to LLVM IR compiler. The idea 
was that a trusted userspace component could do the BPF compilation and load 
binary code into the kernel. BPF would still be BPF and so have the same 
guarantees, but compiling it would be faster (on average, each BPF bytecode was 
slightly more than one x86 instruction after LLVM optimisations had run). LLVM 
was still in the TCB though, even in userspace. I didn’t peruse it because LLVM 
is *not* safe in the presence of untrusted inputs.

More generally, the LLVM IR model is similar to C. It allows arbitrary pointer 
casts and arbitrary pointer arithmetic. It is not a good starting point for 
anything that you want to analyse for security. LLVM analyses take advantage of 
undefined behaviour. An in-bounds address calculation instruction is an 
assertion from the front end that the result will be in bounds. Optimisations 
are free to rely on this, even when they can’t prove it, because it is 
undefined behaviour to claim something is in bounds when it is not. The same is 
true of a lot of other properties on the IR. Many are not computable to recover 
post facto, they rely on translation from a higher-level language that enforces 
the properties by construction.

David


Reply via email to