> Hello! > > There is an interesting comparison of SPEC scores between gcc and icc: > http://people.redhat.com/dnovillo/spec2000.i686/gcc/individual-run-ratio.html > . A quick look at the graphs shows a big differences in achieved scores > between gcc and icc, mostly in SpecFP tests. I was trying to find some > information on this matter, but none can be found in the archives on gcc's > site. > > An interesting examples are: > -177.mesa (this is a c test), where icc is almost 40% faster > -178.galgel, where icc is again 40% faster > -179.art, where llvm is more than 1.5x faster than both gcc and icc > -187.facere, where icc is 100% faster than gcc > -189.lucas, where icc is 60% faster > > I know that these graphs don't show the results of most aggresive > optimization options for gcc, but that is also the case with icc (only > -O2). However, it looks that gcc and icc are not even in the same class > regarding FP performance. Perhaps there is some critical optimizations, > that are not present in gcc? > > I think I'm not the only person, that finds these results rather > "dissapointing". As Scott is currently writing a paper on gcc's FP > performance, perhaps someone has an explanation, why gcc's results are > so low on Pentium4 for these tests?
Part of reason is the fact that ICC defaults to SSE math while GCC to x87 math on 32bit. I am not sure what it does in setup Diego use (ie whether vectorization is done or if loops are unrolled). Andreas's tester (http://www.suse.de/~aj/SPEC/amd64) shows similar comparsions on Opteron for both 32bit and 64bit runs. The ICC runs uses same flag as AMD published results so presumably good choice of aggressive optimization flags. This is comparing apples to oranges too as 64bit runs suffers from memory problems, 32bit runs from x87 and ICC from lack of Opteron support but gives some more idea. On Opteron we lose score in mesa because ICC runs are with profile feedback and there is division by value that is always 360 in the internal loop. You can see tree-profiling branch scores to be better when profile feedback is available on one point of history... Mesa also suffers from code size being too large for caches of Opteron CPU we use. In 64bit compilation Art suffers from register pressure caused by our tree optimizers (at least last time I tried). Swim (and perhaps some other benchmark too?) suffers from the fact htat loops needs to be interchanged, this was fixed by DannyB recently but requires special flag so you don't see it in scores (is this going to be by default) I didn't look too closely to fortran benchmarks. I always assumed that we did poorly on optimizing fortran loops accessing variably sized arrays and we lack vectorization. Zdenek this week improved SPECfp scores by ivopts patches quite impressively (PPC shows order of mangitude improvements, but you can see improvement on Opteron too), so we seem to do somewhat better now... Honza > > Uros.