http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349
Ondrej Bilka <neleai at seznam dot cz> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID | --- Comment #2 from Ondrej Bilka <neleai at seznam dot cz> 2012-08-23 15:45:54 UTC --- (In reply to comment #1) > Not a bug. You need to tune for a CPU where inter-unit moves are desirable. > The default is generic tuning, which is a compromise between Intel CPUs (where > they are desirable) and AMD CPUs (where they are undesirable). In this > particular case the generic tuning doesn't do inter-unit moves as part of the > compromise. If you -mtune=corei7 or similar, you'll get an inter-unit move in > both cases. What amd procesors? Compile following two files with march=core2 and march=amdfam10. Amd version was always at least 5% slower. Tested on AMD Athlon(tm) 64 Processor 3200+,AMD Opteron(tm) Processor 6134 AMD FX(tm)-8150 Eight-Core Processor, AMD Phenom(tm) II X6 1090T Processor #include <emmintrin.h> #include <stdint.h> int64_t foo(int64_t a,int64_t c){__m128i b= _mm_cvtsi64_si128(a),d=_mm_cvtsi64_si128(c); return _mm_cvtsi128_si64(_mm_add_epi8(b,d)); } /*need split otherwise simplified to identical code*/ #include <emmintrin.h> #include <stdint.h> int main(){ int i; int64_t x=0; for (i=0;i<100000000;i++) x=foo(x,1); return x; }