http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55448
Bug #: 55448 Summary: using const-reference SSE or AVX types leads to unnecessary unaligned loads Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: kr...@kde.org The following testcase: #include <immintrin.h> static inline __m256 add(const __m256 &a, const __m256 &b) { return _mm256_add_ps(a, b); } void foo(__m256 &a, const __m256 b) { a = add(a, b); } static inline __m128 add(const __m128 &a, const __m128 &b) { return _mm_add_ps(a, b); } void foo(__m128 &a, const __m128 b) { a = add(a, b); } compiled with "-O2 -mavx" lead to vmovups (%rdi), %xmm1 vinsertf128 $0x1, 16(%rdi), %ymm1, %ymm1 vaddps %ymm0, %ymm1, %ymm0 vmovaps %ymm0, (%rdi) for the __m256 case and vmovups (%rdi), %xmm1 vaddps %xmm0, %xmm1, %xmm0 vmovaps %xmm0, (%rdi) for the __m128 case. It should rather be: vaddps (%rdi), %ymm0, %ymm0 vmovaps %ymm0, (%rdi) and: vaddps (%rdi), %xmm0, %xmm0 vmovaps %xmm0, (%rdi) The latter result can be obtained if the const-ref arguments to add are changed to pass by value.