Re: gcc will become the best optimizing x86 compiler

Agner Fog Thu, 24 Jul 2008 01:04:24 -0700

Dennis Clarke wrote:
>The Sun Studio 12 compiler with Solaris 10 on AMD Opteron or
>UltraSparc beats GCC in almost every single test case that I have
>seen.


This is memcpy on Solaris:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/i386/gen/memcpy.s

It uses exactly the same method as memcpy on gcc libc, with only minordifferences that have no influence on performance.

Also, you have provided no data at all.

I have linked to the data rather than copying it here to save space onthe mailing list. Here is the link again:

http://www.agner.org/optimize/optimizing_cpp.pdf  section 2.6, page 12.

So your assertions are those of a marketing person at the moment.


Who sounds like a marketing person, you or me? :-)

> Please post some code that can be compiled and then tested with highresolution timers and perhaps

> we can compare notes.

Here is my code, again:
http://www.agner.org/optimize/asmlib.zip

My test results, referred to above, uses the "core clock cycles"performance counter on Intel and RDTSC on AMD. It's the highestresolution you can get. Feel free to do you own tests, it's as simple aslinking my library into your test program.


Tim Prince wrote:
>you identify the library you tested only as "ubuntu g++ 4.2.3."
Where can I see the libc version?

>The corresponding 64-bit linux will see vastly different levels ofperformance, depending on the

>glibc version, as it doesn't use a builtin string move.

Yes, this is exactly what my tests show. 64-bit libc is better than32-bit libc, but still 3-4 times slower than the best library forunaligned operands on an Intel.

>Certain newer CPUs aim to improve performance of the 32-bit gccbuiltin string moves, but don't

> entirely eliminate the situations where it isn't optimum.

The Intel manuals are not clear about this. Intel Optimization referencemanual says:>In most cases, applications should take advantage of the defaultmemory routines provided by Intel compilers.What an excellent advice - the Intel compiler puts in a library with anautomatic run-slowly-on-AMD feature!

The Intel library does not use rep movs when running on an Intel CPU.

The AMD software optimization guide mentions specific situations whererep movs is optimal. However, my tests on an Opteron (K8) tell that repmovs is never optimal on AMD either. I have no access to test it on thenew AMD K10, but I expect the XMM register code to run much faster onK10 than on K8 because K10 has 128-bit data paths where K8 has only 64-bit.

Evidently, the problem with memcpy has been ignored for years, seehttp://softwarecommunity.intel.com/Wiki/Linux/719.htm

Re: gcc will become the best optimizing x86 compiler

Reply via email to