https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114448

            Bug ID: 114448
           Summary: Roundup not optimized
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pali at kernel dot org
  Target Milestone: ---

https://godbolt.org/z/4fPKGzs1M

Straightforward code which round up unsigned number to the next multiply of 4
is:

    (num % 4 == 0) ? num : num + (4 - num % 4);

gcc -O2 generates:

    mov     edx, edi
    mov     eax, edi
    and     edx, -4
    add     edx, 4
    test    dil, 3
    cmovne  eax, edx
    ret


This is not optimal and branch/test can be avoided by using double modulo:

    num + (4 - num % 4) % 4;

for which gcc -O2 generates:

    mov     eax, edi
    neg     eax
    and     eax, 3
    add     eax, edi
    ret


Optimal implementation for round up 4 is using bithacks:

    (num + 3) & ~3;

for which gcc -O2 generates:

    lea     eax, [rdi+3]
    and     eax, -4
    ret

Reply via email to