Issue |
158585
|
Summary |
[MC][x86-64] Clang silently mis-assembles instructions with high-byte registers and `EVEX/VEX` prefixes in release mode and crashes with assertion enabled
|
Labels |
clang
|
Assignees |
|
Reporter |
venkyqz
|
# Description
When using Clang's integrated assembler (LLVM-MC), a critical issue arises when assembling an instruction that both requires an `EVEX or VEX prefix` for encoding and uses a high-byte register (`ah`, `ch`, `dh`, or `bh`). The fundamental incompatibility between these two encoding requirements leads to a silent misassembling bug in the release mode of
This results in a dangerous discrepancy between different build modes:
+ In `debug/assertion builds`: The assembler's internal checks correctly identify the invalid instruction. This leads to a crash with an informative error message: "LLVM ERROR: Cannot encode high byte register in VEX/EVEX-prefixed instruction." This behavior, although it causes a crash, is useful for development as it clearly signals a fatal error.
+ In `release builds`: The internal checks are disabled. The assembler silently proceeds with the invalid instruction, resulting in a **silent mis-assembly.** This behavior is problematic and dangerous as it produces a defective executable without any warning or error. The resulting binary's behavior is unpredictable, potentially leading to security vulnerabilities or silent data corruption.
# Impact and Scope
+ This bug is not limited to a single instruction but affects a broad range of opcodes. Any instruction that the assembler attempts to encode with an EVEX/VEX prefix will exhibit this issue if a high-byte register is used. This includes:
+ Instructions that natively use EVEX/VEX: The entire `ccmp*` and `ctest*` family of instructions from `AVX-512` are directly affected.
+ Standard instructions that can be implicitly encoded with EVEX/VEX: Many traditional opcodes (add, sub, and, or, etc.) can be affected when the assembler chooses to use these prefixes for specific operand combinations or modern CPU features.
+ The list of potentially affected opcodes is extensive, including but not limited to:
> adc, add, and
>
> All ccmp* and ctest* instructions
>
> dec, inc, neg, not
>
> or, sbb, shl, shr
>
> All setzu* instructions
>
> sub, xor
# Expected Behavior
An assembler should always be robust and predictable. The correct behavior is to consistently fail with an explicit error message when given an impossible instruction, regardless of the build mode. This is demonstrated by GAS binutils, which rejects the instruction with the clear error: "Error: can't encode register 'ah' in an instruction requiring EVEX prefix." The silent mis-assembly in release mode is an unacceptable form of Undefined Behavior.
# Ways to Reproduce
Godbolt Link: https://godbolt.org/z/jhP3o4v8q
# Observation
+ `Clang 17.0.1` is the latest version that does not have that problem.
+ The behaviour of `trunk binutils GAS` is correct.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs