Re: Big differences on SpecFP results for gcc and icc

Jan Hubicka Sun, 12 Jun 2005 02:39:45 -0700

> Hello!
> 
> There is an interesting comparison of SPEC scores between gcc and icc: 
> http://people.redhat.com/dnovillo/spec2000.i686/gcc/individual-run-ratio.html 
> . A quick look at the graphs shows a big differences in achieved scores 
> between gcc and icc, mostly in SpecFP tests. I was trying to find some 
> information on this matter, but none can be found in the archives on gcc's 
> site.
> 
> An interesting examples are:
> -177.mesa (this is a c test), where icc is almost 40% faster
> -178.galgel, where icc is again 40% faster
> -179.art, where llvm is more than 1.5x faster than both gcc and icc
> -187.facere, where icc is 100% faster than gcc
> -189.lucas, where icc is 60% faster
> 
> I know that these graphs don't show the results of most aggresive 
> optimization options for gcc, but that is also the case with icc (only 
> -O2). However, it looks that gcc and icc are not even in the same class 
> regarding FP performance. Perhaps there is some critical optimizations, 
> that are not present in gcc?
> 
> I think I'm not the only person, that finds these results rather 
> "dissapointing". As Scott is currently writing a paper on gcc's FP 
> performance, perhaps someone has an explanation, why gcc's results are 
> so low on Pentium4 for these tests?


Part of reason is the fact that ICC defaults to SSE math while GCC to
x87 math on 32bit.  I am not sure what it does in setup Diego use (ie
whether vectorization is done or if loops are unrolled).  Andreas's
tester (http://www.suse.de/~aj/SPEC/amd64) shows similar comparsions on
Opteron for both 32bit and 64bit runs.  The ICC runs uses same flag as
AMD published results so presumably good choice of aggressive
optimization flags.  This is comparing apples to oranges too as 64bit
runs suffers from memory problems, 32bit runs from x87 and ICC from lack
of Opteron support but gives some more idea.

On Opteron we lose score in mesa because ICC runs are with profile
feedback and there is division by value that is always 360 in the
internal loop.  You can see tree-profiling branch scores to be better
when profile feedback is available on one point of history...  Mesa also
suffers from code size being too large for caches of Opteron CPU we use.

In 64bit compilation Art suffers from register pressure caused by our
tree optimizers (at least last time I tried).

Swim (and  perhaps some other benchmark too?) suffers from the fact htat
loops needs to be interchanged, this was fixed by DannyB recently but
requires special flag so you don't see it in scores (is this going to be
by default)

I didn't look too closely to fortran benchmarks.  I always assumed that
we did poorly on optimizing fortran loops accessing variably sized
arrays and we lack vectorization.  Zdenek this week improved SPECfp
scores by ivopts patches quite impressively (PPC shows order of
mangitude improvements, but you can see improvement on Opteron too), so
we seem to do somewhat better now...

Honza
> 
> Uros.

Re: Big differences on SpecFP results for gcc and icc

Reply via email to