Hi Richard!

On Thu, Sep 8, 2011 at 11:02 AM, Richard Guenther
<richard.guent...@gmail.com> wrote:
> On Thu, Sep 8, 2011 at 12:31 AM, Steve White
> <stevan.wh...@googlemail.com> wrote:
>> Hi,
>>
>> I run some tests of simple number-crunching loops whenever new
>> architectures and compilers arise.
>>
>> These tests on recent Intel architectures show similar performance
>> between gcc and icc compilers, at full optimization.
>>
>> However a recent test on x86_64 showed the open64 compiler
>> outstripping gcc by a factor of 2 to 3.  I tried all the obvious
>> flags; nothing helped.
>
> Like -funroll-loops?
>

** Let's turn it around:  What are a good set of flags then for
improving speed in simple loops such as these on the x86_64?

In fact, I did try -funroll-loops and several others, but I somehow
fooled myself (Maybe partly because, as I wrote, I was under the
impression -O3 turned this on by default.)

With -funroll-loops, the performance is improved a lot.

$ gcc --std=c99 -O3 -funroll-loops -Wall -pedantic mults_by_const.c
$ ./a.out
double array mults by const             320 ms [  1.013193]

Which puts it only a factor of 2 slower than the open64 -O3.

Furthermore, -march=native improves it yet more.

$ gcc --std=c99 -O3 -funroll-loops -march=native -Wall -pedantic
mults_by_const.c
$ ./a.out
double array mults by const             300 ms [  1.013193]

Now it's only 70% slower than the open64 results.

I tried these flags
   -floop-optimize  -fmove-loop-invariants -fprefetch-loop-arrays -fprofile-use
but saw no further improvements.

So I drop my claim of knowing what the problem is (and repent of even
having tried before.)

Simple searches on the web turn up a lot of experiments, nothing definitive.

FWIW, also attached is the whole assembler file generated with the
above settings.

To my eye, the gcc assembler is a great deal more complicated, and
does a lot more stuff, besides being slower.

Thanks!

Attachment: mults_by_const.s.gz
Description: GNU Zip compressed data

Reply via email to