http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54349

Ondrej Bilka <neleai at seznam dot cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |

--- Comment #2 from Ondrej Bilka <neleai at seznam dot cz> 2012-08-23 15:45:54 
UTC ---
(In reply to comment #1)
> Not a bug.  You need to tune for a CPU where inter-unit moves are desirable. 
> The default is generic tuning, which is a compromise between Intel CPUs (where
> they are desirable) and AMD CPUs (where they are undesirable).  In this
> particular case the generic tuning doesn't do inter-unit moves as part of the
> compromise.  If you -mtune=corei7 or similar, you'll get an inter-unit move in
> both cases.

What amd procesors?

Compile following two files with march=core2 and march=amdfam10. Amd version
was always at least 5% slower.
Tested on AMD Athlon(tm) 64 Processor 3200+,AMD Opteron(tm) Processor 6134
AMD FX(tm)-8150 Eight-Core Processor, AMD Phenom(tm) II X6 1090T Processor


#include <emmintrin.h>
#include <stdint.h>

int64_t foo(int64_t a,int64_t c){__m128i b= 
_mm_cvtsi64_si128(a),d=_mm_cvtsi64_si128(c);
  return _mm_cvtsi128_si64(_mm_add_epi8(b,d));
}
/*need split otherwise simplified to identical code*/
#include <emmintrin.h>
#include <stdint.h>

int main(){
  int i;
  int64_t x=0;
  for (i=0;i<100000000;i++) x=foo(x,1);
  return x;
}

Reply via email to