http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55448



             Bug #: 55448

           Summary: using const-reference SSE or AVX types leads to

                    unnecessary unaligned loads

    Classification: Unclassified

           Product: gcc

           Version: 4.7.2

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: kr...@kde.org





The following testcase:



#include <immintrin.h>

static inline __m256 add(const __m256 &a, const __m256 &b) { return

_mm256_add_ps(a, b); }

void foo(__m256 &a, const __m256 b) { a = add(a, b); }



static inline __m128 add(const __m128 &a, const __m128 &b) { return

_mm_add_ps(a, b); }

void foo(__m128 &a, const __m128 b) { a = add(a, b); }



compiled with "-O2 -mavx"



lead to

        vmovups (%rdi), %xmm1

        vinsertf128     $0x1, 16(%rdi), %ymm1, %ymm1

        vaddps  %ymm0, %ymm1, %ymm0

        vmovaps %ymm0, (%rdi)



for the __m256 case and



        vmovups (%rdi), %xmm1

        vaddps  %xmm0, %xmm1, %xmm0

        vmovaps %xmm0, (%rdi)



for the __m128 case.



It should rather be:

        vaddps  (%rdi), %ymm0, %ymm0

        vmovaps %ymm0, (%rdi)

and:

        vaddps  (%rdi), %xmm0, %xmm0

        vmovaps %xmm0, (%rdi)



The latter result can be obtained if the const-ref arguments to add are changed

to pass by value.

Reply via email to