https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105521
Bug ID: 105521 Summary: missed optimization in modulo arithmetic Product: gcc Version: 11.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: goswin-v-b at web dot de Target Milestone: --- I'm trying to compute (a*a)%n for uint64_t types on x86_64 using "gcc -O2 -W -Wall" like this: #include <stdint.h> #include <assert.h> uint64_t sqrmod(uint64_t a, uint64_t n) { assert(a < n); unsigned __int128 x = a; x *= a; return x % n; } I expected to get the following code: sqrmod: cmpq %rsi, %rdi jnb .L13 // assert(a < n) failure movq %rdi, %rax mul %rdi div %rsi movq %rdx, %rax ret The compiler does get the "mul" right but instead of the "div" it throws in a call to "__umodti3". The "__umodti3" function is horribly long code that will be worlds slower than a simple div. Note: The "asset(a < n);" should tell the compiler that the "div" instruction can not overflow and will not cause a #DivisionError. Without the assert the compiler could (conditionally) add "a %= n;" for the same effect. https://godbolt.org/z/cd57Wd4oo