https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114448
Bug ID: 114448 Summary: Roundup not optimized Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: pali at kernel dot org Target Milestone: --- https://godbolt.org/z/4fPKGzs1M Straightforward code which round up unsigned number to the next multiply of 4 is: (num % 4 == 0) ? num : num + (4 - num % 4); gcc -O2 generates: mov edx, edi mov eax, edi and edx, -4 add edx, 4 test dil, 3 cmovne eax, edx ret This is not optimal and branch/test can be avoided by using double modulo: num + (4 - num % 4) % 4; for which gcc -O2 generates: mov eax, edi neg eax and eax, 3 add eax, edi ret Optimal implementation for round up 4 is using bithacks: (num + 3) & ~3; for which gcc -O2 generates: lea eax, [rdi+3] and eax, -4 ret