https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/76261

This patch introduces the runtime components of a type sanitizer: a sanitizer 
for type-based aliasing violations.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these 
given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be 
used to access an object of a different type (with, of course, certain 
exceptions). Unfortunately, there's a lot of code in the wild that violates 
these rules (e.g. for type punning), and such code often must be built with 
-fno-strict-aliasing. Performance is often sacrificed as a result. Part of the 
problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer 
will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the 
corresponding instrumentation pass generates descriptor tables. Thus, for each 
type (and access descriptor), we have a unique pointer representation. 
Excepting anonymous-namespace types, these tables are comdat, so the pointer 
values should be unique across the program. The descriptors refer to other 
descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). 
The instrumentation handles the "fast path" (where the types match exactly and 
no partial-overlaps are detected), and defers to the runtime to handle all of 
the more-complicated cases. The runtime, of course, is also responsible for 
reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and we use 
8 bytes of shadow memory, the size of the pointer to the type descriptor, for 
every byte of accessed data in the program. The value 0 is used to represent an 
unknown type. The value -1 is used to represent an interior byte (a byte that 
is part of a type, but not the first byte). The instrumentation first checks 
for an exact match between the type of the current access and the type for that 
address recorded in the shadow memory. If it matches, it then checks the shadow 
for the remainder of the bytes in the type to make sure that they're all -1. If 
not, we call the runtime. If the exact match fails, we next check if the value 
is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of 
the byes in the type (to make sure they're all 0). If they're not, we call the 
runtime. We then set the shadow for the access address and set the shadow for 
the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If 
the type indicated by the shadow memory for the access address is neither an 
exact match nor 0, we call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set the 
memory updated by memset, memcpy, and memmove, as well as allocas/byval (and 
for lifetime.start/end) to reset the shadow memory to reflect that the type is 
now unknown. The runtime intercepts memset, memcpy, etc. to perform the same 
function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA algorithm, 
just as the compiler does, to determine when two types are permitted to alias. 
In a situation where access overlap has occurred and aliasing is not permitted, 
an error is generated.

Clang's TBAA representation currently has a problem representing unions, as 
demonstrated by the one XFAIL'd test. We'll update the TBAA representation to 
fix this, and at the same time, update the sanitizer.

As a note, this implementation does not use the compressed shadow-memory scheme 
discussed previously 
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme 
would not handle the struct-path (i.e. structure offset) information that our 
TBAA represents. I expect we'll want to further work on compressing the 
shadow-memory representation, but I think it makes sense to do that as 
follow-up work.

(This includes build fixes for Linux from Mingjie Xu)

Based on https://reviews.llvm.org/D32197.



_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to