On Thu, May 30, 2024 at 01:24:56AM UTC, Andrew Cooper wrote: > On 29/05/2024 3:20 am, Gatlin Newhouse wrote: > > diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h > > index a3ec87d198ac..e3fbed9073f8 100644 > > --- a/arch/x86/include/asm/bug.h > > +++ b/arch/x86/include/asm/bug.h > > @@ -13,6 +13,14 @@ > > #define INSN_UD2 0x0b0f > > #define LEN_UD2 2 > > > > +/* > > + * In clang we have UD1s reporting UBSAN failures on X86, 64 and 32bit. > > + */ > > +#define INSN_UD1 0xb90f > > +#define LEN_UD1 2 > > +#define INSN_REX 0x67 > > +#define LEN_REX 1 > > That's an address size override prefix, not a REX prefix.
Good to know, thanks. > What information is actually encoded in this UD1 instruction? I can't > find anything any documentation which actually discusses how the ModRM > byte is encoded. lib/ubsan.h has a comment before the ubsan_checks enum which links to line 113 in LLVM's clang/lib/CodeGen/CodeGenFunction.h which defines the values for the ModRM byte. I think the Undefined Behavior Sanitizer pass does the actual encoding of UB type to values but I'm not an expert in LLVM. > ~Andrew