Issue |
117304
|
Summary |
[x86][MC] Fail to decode some long multi-byte NOPs
|
Labels |
|
Assignees |
|
Reporter |
Mar3yZhang
|
### Work environment
| Questions | Answers
|------------------------------------------|--------------------
| OS/arch/bits | x86_64 Ubuntu 20.04
| Architecture | x86_64
| Source of Capstone | `git clone`, default on `master` branch.
| Version/git commit | llvm-20git, [f08278](https://github.com/llvm/llvm-project/commit/f082782c1b3ec98f50237ddfc92e6776013bf62f)
<!-- INCORRECT DISASSEMBLY BUGS -->
### minimum disassembler PoC program
```c
int main(int argc, char *argv[]){
/*
some input sanity check of hex string from argv
*/
// Initialize LLVM after input validation
LLVMInitializeAllTargetInfos();
LLVMInitializeAllTargets();
LLVMInitializeAllTargetMCs();
LLVMInitializeAllDisassemblers();
LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
if (!disasm) {
errx(1, "Error: LLVMCreateDisasm() failed.");
}
// Set disassembler options: print immediates as hex, use Intel syntax
if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
LLVMDisassembler_Option_AsmPrinterVariant)) {
errx(1, "Error: LLVMSetDisasmOptions() failed.");
}
char output_string[MAX_OUTPUT_LENGTH];
uint64_t address = 0;
size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
output_string, sizeof(output_string));
if (instr_len > 0) {
printf("%s\n", output_string);
} else {
printf("Error: Unable to disassemble the input bytes.\n");
}
}
```
### Instruction bytes giving faulty results
```
0f 1a de
```
### Expected results
It should be:
```
nop esi, ebx
```
### Actually results
```sh
$./min_llvm_disassembler "0f1ade"
Error: Unable to disassemble the input bytes.
```
### Other cases seem to work
```sh
$./min_llvm_disassembler "0f1f00"
nop dword ptr [rax]
```
<!-- ADDITIONAL CONTEXT -->
### Additional Logs, screenshots, source code, configuration dump, ...
Instructions with opcodes ranging from `0f 18` to `0f 1f` are defined as multi-byte NOP (No Operation) instructions in x86 ISA. Please refer to the [StackOverflow post](https://stackoverflow.com/questions/25545470/long-multi-byte-nops-commonly-understood-macros-or-other-notation) for more details. It should be decoded in the following logic.
- "0x0f 0x1a" is extended opcode.
- The ModR/M byte DE translates to binary 11011110 (0xde).
- Bits 7-6 (Mod): 11 (binary) = 3 (decimal)
Indicates register-direct addressing mode.
- Bits 5-3 (Reg): 011 (binary) = 3 (decimal)
Corresponds to the EBX (or RBX in 64-bit mode) register.
- Bits 2-0 (R/M): 110 (binary) = 6 (decimal)
Corresponds to the ESI (or RSI in 64-bit mode) register.
XED also translates "0f 1a de" into "nop esi, ebx".
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs