Issue 172172
Summary [missed-opt] [x86_64] Suboptimal movzx after inline assembly returning a byte
Labels new issue
Assignees
Reporter purplesyringa
    Found while trying to implement a fast black-box primitive.

[Godbolt](https://godbolt.org/z/K3j7PKPK8)

```cpp
char f(char x) {
    asm("nop" : "+r"(x));
    return x * 3;
}

short g(short x) {
    asm("nop" : "+r"(x));
    return x * 3;
}

char h(char x) {
    return x * 3;
}
```

```asm
f(char):
        nop
 movzx   eax, dil
        lea     eax, [rax + 2*rax]
 ret

g(short):
        nop
        lea     eax, [rdi + 2*rdi]
 ret

h(char):
        lea     eax, [rdi + 2*rdi]
        ret
```

The line `movzx eax, dil` in `f` can be omitted (and, indeed, GCC omits it). I initially thought this was some kind of dependency-breaking optimization, but I'm not sure anymore. For one thing, it's not done for 16-bit numbers (`g`), which would seemingly suffer from the same issue. It is also not done in `h`, where the input to `lea` is the function argument, which by psABI has undefined top bits. If this is an optimization attempt, it seems more like a pessimization after inline assembly, which the author supposedly made as efficient as possible, and there's no way to opt out of the zero-extenion.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to