/** intrin.c * * gcc-4.0 misoptimizes the _mm_load_sd() away with * -O1 (x86-64), with or without -m32 -msse2. * * (c) Kurt Garloff <[EMAIL PROTECTED]>, Artistic v2 */ #include <stdlib.h> #include <emmintrin.h> #ifdef WORKAROUND # define ACCESS(X) asm("": : "x"(X)) #else # define ACCESS(X) #endif void do_copy(const unsigned int ln, double* const dst, const double* const src) { int i = ln; const register double *s = src; register double *d = dst; __m128d TMP; while (i) { TMP = _mm_load_sd(s); ACCESS(TMP); _mm_store_sd(d, TMP); --i; ++s; ++d; } } int main() { unsigned int i; double *a, *b ,*c; a = (double*) malloc(19*sizeof(double)); b = (double*) malloc(19*sizeof(double)); for (i = 0; i < 19; ++i) { a[i] = 1; b[i] = 2; } do_copy(19, a, b); return (a[18] != 2); } The test program should return 0, which it does if gcc-3.3/3.4 is used or if compiled with -DWORKAROUND. gcc-4.0, 4_0-branch, HEAD, and tree-profiling-branch all fail: The _mm_load_sd() is optimized away. I guess the compiler does not consider the _mm_store_sd() as a consumer of the vector register. Adding the fake consumer asm(""::x(XMMREG)); helps thus. Compiling with -m32 -msse2 exposes the same problem, I have a strong suspicion the native compiler on x86 would have the same problem. Here's the wrong assembly produced by gcc-4.0 (on x86-64, using -O2): do_copy: .LFB495: testl %edi, %edi jne .L8 rep ; ret .p2align 4,,7 .L8: xorl %eax, %eax .p2align 4,,7 .L4: incl %eax movq $0, (%rsi) addq $8, %rsi cmpl %eax, %edi jne .L4 rep ; ret ... and here the correct assembly with -DWORKAROUND added: do_copy: .LFB495: testl %edi, %edi jne .L8 rep ; ret .p2align 4,,7 .L8: xorl %eax, %eax .p2align 4,,7 .L4: movsd (%rdx), %xmm0 incl %eax movlpd %xmm0, (%rsi) addq $8, %rdx addq $8, %rsi cmpl %eax, %edi jne .L4 rep ; ret
-- Summary: Illegal elimination of SSE2 load/store using xmm intrinsics Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: kurt at garloff dot de CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: x86_64-suse-linux GCC host triplet: x86_64-suse-linux GCC target triplet: x86_64-suse-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21239