Issue 117304
Summary [x86][MC] Fail to decode some long multi-byte NOPs
Labels
Assignees
Reporter Mar3yZhang
    ### Work environment

| Questions                                | Answers
|------------------------------------------|--------------------
| OS/arch/bits                             | x86_64 Ubuntu 20.04
| Architecture                             | x86_64
| Source of Capstone | `git clone`, default on `master` branch.
| Version/git commit                   | llvm-20git, [f08278](https://github.com/llvm/llvm-project/commit/f082782c1b3ec98f50237ddfc92e6776013bf62f)

<!-- INCORRECT DISASSEMBLY BUGS -->

### minimum disassembler PoC program
```c
int main(int argc, char *argv[]){
    /*
       some input sanity check of hex string from argv
    */
    // Initialize LLVM after input validation
    LLVMInitializeAllTargetInfos();
 LLVMInitializeAllTargets();
    LLVMInitializeAllTargetMCs();
 LLVMInitializeAllDisassemblers();

    LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
    if (!disasm) {
 errx(1, "Error: LLVMCreateDisasm() failed.");
    }

    // Set disassembler options: print immediates as hex, use Intel syntax
    if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
 LLVMDisassembler_Option_AsmPrinterVariant)) {
        errx(1, "Error: LLVMSetDisasmOptions() failed.");
 }

    char output_string[MAX_OUTPUT_LENGTH];
    uint64_t address = 0;
    size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
 output_string, sizeof(output_string));

    if (instr_len > 0) {
 printf("%s\n", output_string);
    } else {
 printf("Error: Unable to disassemble the input bytes.\n");
 }
}
```

### Instruction bytes giving faulty results

```
0f 1a de
```

### Expected results

It should be:
```
nop esi, ebx
```

### Actually results

```sh
$./min_llvm_disassembler "0f1ade"
Error: Unable to disassemble the input bytes.
```

### Other cases seem to work
```sh
$./min_llvm_disassembler "0f1f00"
nop     dword ptr [rax]
```

<!-- ADDITIONAL CONTEXT -->

### Additional Logs, screenshots, source code,  configuration dump, ...
Instructions with opcodes ranging from `0f 18` to `0f 1f` are defined as multi-byte NOP (No Operation) instructions in x86 ISA. Please refer to the [StackOverflow post](https://stackoverflow.com/questions/25545470/long-multi-byte-nops-commonly-understood-macros-or-other-notation) for more details. It should be decoded in the following logic. 
- "0x0f 0x1a" is extended opcode.
- The ModR/M byte DE translates to binary 11011110 (0xde).
    - Bits 7-6 (Mod): 11 (binary) = 3 (decimal)
 Indicates register-direct addressing mode.
    - Bits 5-3 (Reg): 011 (binary) = 3 (decimal)
    Corresponds to the EBX (or RBX in 64-bit mode) register.
    - Bits 2-0 (R/M): 110 (binary) = 6 (decimal)
 Corresponds to the ESI (or RSI in 64-bit mode) register.

XED also translates "0f 1a de" into "nop esi, ebx".

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to