https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124585
Bug ID: 124585
Summary: missed-optimization - redundant cmp instruction after
sub
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: jack.william.heard at gmail dot com
Target Milestone: ---
I'm not sure if this is of interest to you, but I found what I believe is a
missed optimization.
https://compiler-explorer.com/z/1a6T7sE5h
Compiling this code:
```
unsigned long reduce_mod_p(unsigned long a, unsigned long p) {
if (a >= p)
a -= p;
return a;
}
```
Produces this assembly:
```
mov rax, rdi
sub rax, rsi
cmp rdi, rsi
cmovb rax, rdi
ret
```
The `cmp` instruction in this output is redundant. The `cmp` instruction is
setting flags as though one had done a `sub` calculating (a-p). However the
`sub` instruction has just calculated (a-p) - so the flags are already set.
Desired asm output would be the same but without the cmp:
```
mov rax, rdi
sub rax, rsi
cmovb rax, rdi
ret
```
My application is doing a bunch of modular arithmetic and this logic comes up
in a few places, I can get my desired asm by using `__builtin_sub_overflow` (
https://compiler-explorer.com/z/j397bMfKr ) and this speeds up the full
application by a few percent.
I can reproduce this locally on my machine with `gcc -O3 -march=znver3
-masm=intel -S mod_minimal_repr.c` with gcc version
`gcc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0` - but I imagine the
compiler-explorer link is more useful? It appears to happen across multiple
compiler versions and multiple `-march` values.
Hopefully I explained this clearly and this is of interest. Let me know if
there's any other info you need or if there's anything I can do to help.