[Bug rtl-optimization/47010] New: Missed optimization: x86-64 prologue not deleted

schnetter at gmail dot com Sat, 18 Dec 2010 18:43:21 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47010


           Summary: Missed optimization: x86-64 prologue not deleted
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: schnet...@gmail.com


Created attachment 22818
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22818
pre-processed bzipped source code

The following code is generated by g++ 4.5.1 on an x86-64 architecture (Mac OS
10.6). This is a static function where g++ may even have modified the argument
list. I believe the three instructions "pushq", "movq", and "leave" are not
necessary. This routine is called in a compute-intensive inner loop that has
problems fitting into the level 1 instruction cache.

The disassembled routine is:

__ZL20PDstandardNth11_implPKdll.clone.1:
0000000000000140        pushq   %rbp
0000000000000141        movupd  0x10(%rdi),%xmm3
0000000000000146        movupd  0xf0(%rdi),%xmm0
000000000000014b        movupd  0x08(%rdi),%xmm2
0000000000000150        addpd   %xmm3,%xmm0
0000000000000154        movupd  0xf8(%rdi),%xmm1
0000000000000159        movq    %rsp,%rbp
000000000000015c        addpd   %xmm2,%xmm1
0000000000000160        mulpd   0x000a0578(%rip),%xmm1
0000000000000168        addpd   %xmm0,%xmm1
000000000000016c        movupd  (%rdi),%xmm0
0000000000000170        mulpd   0x000a0578(%rip),%xmm0
0000000000000178        leave
0000000000000179        addpd   %xmm1,%xmm0
000000000000017d        ret

The original function is defined as:

static CCTK_REAL_VEC PDstandardNth11_impl(CCTK_REAL const* restrict const u,
ptrdiff_t const dj, ptrdiff_t const dk) __attribute__((pure))
__attribute__((noinline)) __attribute__((unused));

static CCTK_REAL_VEC PDstandardNth11_impl(CCTK_REAL const* restrict const u,
ptrdiff_t const dj, ptrdiff_t const dk)
{ return
kmadd(ToReal(30),vec_loadu_maybe3(0,0,0,(u)[(0)+dj*(0)+dk*(0)]),kmadd(ToReal(-16),kadd(vec_loadu_maybe3(-1,0,0,(u)[(-1)+dj*(0)+dk*(0)]),vec_loadu_maybe3(1,0,0,(u)[(1)+dj*(0)+dk*(0)])),kadd(vec_loadu_maybe3(-2,0,0,(u)[(-2)+dj*(0)+dk*(0)]),vec_loadu_maybe3(2,0,0,(u)[(2)+dj*(0)+dk*(0)]))));
}

where CCTK_REAL is double, and CCTK_REAL_VEC is __m128d, the SSE2 vector of
doubles. The function body contains macros that translate directly to Intel
SSE2 vector instructions.

The code was compiled with gcc 4.5.1 with the options

g++-mp-4.5 -g3 -m128bit-long-double -march=native -std=gnu++0x -O3
-funsafe-loop-optimizations -fsee -ftree-loop-linear -ftree-loop-im -fivopts
-fvect-cost-model -funroll-loops -funroll-all-loops
-fvariable-expansion-in-unroller -fprefetch-loop-arrays -ffast-math
-fassociative-math -freciprocal-math -fno-trapping-math -fexcess-precision=fast
-fopenmp -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align
-Woverloaded-virtual 

I attach the complete pre-processed and bzipped source code. The source code
itself is auto-generated.

[Bug rtl-optimization/47010] New: Missed optimization: x86-64 prologue not deleted

Reply via email to