[Bug other/83951] New: [missed optimization] difference calculation for floats vs ints in a loop

eyalroz at technion dot ac.il Sat, 20 Jan 2018 02:01:09 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83951


            Bug ID: 83951
           Summary: [missed optimization] difference calculation for
                    floats vs ints in a loop
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Consider the following code:

template <typename T>
int foo(T* __restrict__ a)
{
    int i; T val = 0;
    for (i = 0; i < 100; i++) {
        val = 2 * i;
        a[i] = val;
    }
}

template int foo<int>(int* __restrict__ a);
template int foo<float>(float* __restrict__ a);

(This is based on example 7.26 in Agner Fog's Optimizing Software in C++; but
the use of C++ here is immaterial).

The int version compiles, with -O2, into:

foo(int*):
        xor     eax, eax
.L2:
        mov     DWORD PTR [rdi], eax
        add     eax, 2
        add     rdi, 4
        cmp     eax, 200
        jne     .L2
        rep ret

One would expect that the float version would compile into something similar,
except that instead of rdi we would have a floating-point register, initialized
to 0 and incremented by float 2.0 with each iteration. Instead, we get:

int foo<float>(float*):
        xor     eax, eax
.L6:
        pxor    xmm0, xmm0
        add     rdi, 4
        cvtsi2ss        xmm0, eax
        add     eax, 2
        movss   DWORD PTR [rdi-4], xmm0
        cmp     eax, 200
        jne     .L6
        rep ret

which seems to be much slower.

Checked here: https://godbolt.org/g/RVBNyY

[Bug other/83951] New: [missed optimization] difference calculation for floats vs ints in a loop

Reply via email to