https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102929
Bug ID: 102929 Summary: [missed optimization] two ways to rounddown-to-next-multiple Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jengelh at inai dot de Target Milestone: --- Input ===== unsigned long calc(unsigned long x, unsigned long y) { return x/y*y; } unsigned long calc2(unsigned long x, unsigned long y) { return x - x % y; } Observed ======== ยป g++ -O3 -c x.c; objdump -Mintel -d x.o gcc version 11.2.1 20210816 [revision 056e324ce46a7924b5cf10f61010cf9dd2ca10e9] (SUSE Linux) x86_64 0000000000000000 <_Z4calcmm>: 0: 48 89 f8 mov rax,rdi 3: 31 d2 xor edx,edx 5: 48 f7 f6 div rsi 8: 48 0f af c6 imul rax,rsi c: c3 ret d: 0f 1f 00 nop DWORD PTR [rax] 0000000000000010 <_Z5calc2mm>: 10: 48 89 f8 mov rax,rdi 13: 31 d2 xor edx,edx 15: 48 f7 f6 div rsi 18: 48 89 f8 mov rax,rdi 1b: 48 29 d0 sub rax,rdx 1e: c3 ret Expected ======== I do not see any obvious differences in the outcome of the two C functions, so I would expect that, ideally, both should lead to the same asm. (Either by making calc use div-mov-sub, or by making calc2 using div-imul; whichever happens to be determined more beneficial as per the machine descriptions).