https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83952

            Bug ID: 83952
           Summary: [missed optimization] difference calculation for
                    floats vs ints in a loop
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: eyalroz at technion dot ac.il
  Target Milestone: ---

Created attachment 43195
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43195&action=edit
Code exemplifying the issue

Consider the following code:

template <typename T>
void foo(T* __restrict__ a)
{
    int i; T val = 0;
    for (i = 0; i < 100; i++) {
        val = 2 * i;
        a[i] = val;
    }
}

template void foo<int>(int* __restrict__ a);
template void foo<float>(float* __restrict__ a);

(This is based on example 7.26 in Agner Fog's Optimizing Software in C++; but
the use of C++ here is immaterial).

The int version compiles, with -O2, into:

void foo<int>(int*):
        xor     eax, eax
.L2:
        mov     DWORD PTR [rdi], eax
        add     eax, 2
        add     rdi, 4
        cmp     eax, 200
        jne     .L2
        rep ret

One would expect that the float version would compile into something similar,
except that instead of rdi we would have a floating-point register, initialized
to 0 and incremented by float 2.0 with each iteration. Instead, we get:

void foo<float>(float*):
        xor     eax, eax
.L6:
        pxor    xmm0, xmm0
        add     rdi, 4
        cvtsi2ss        xmm0, eax
        add     eax, 2
        movss   DWORD PTR [rdi-4], xmm0
        cmp     eax, 200
        jne     .L6
        rep ret

which seems to be much slower.

Checked here: https://godbolt.org/g/t8Hvyn

Reply via email to