Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

Jan Hubicka Sun, 25 Feb 2007 14:27:05 -0800

> "Vladimir N. Makarov" <[EMAIL PROTECTED]> writes:
> 
> > I run SPEC2000 several times per week and always look at 3 runs (to be
> > sure that is nothing wrong happened) but I never saw such big
> > "confidence" intervals (as I understand that is difference between max
> > and min of 3 runs divided by the score). [...]
> 
> No, it is much more complex than that, I've used generally accepted
> definition of a confidence interval, see 
> http://en.wikipedia.org/wiki/Confidence_interval
> which basically tells that with 95% probabilty (the confidence level I've 
> choosed)
> true value lies in this interval.
> 
> I've used conservative estimate of confidence intervals in this case
> because I didn't assume gaussian distribution of numbers which I
> reported as difference between two run times, and this estimate is somewhat
> bigger than difference between max and min of 3 runs :)
> 
> > [...] If the machine has only 512 Mb memory (even they
> > write that it is enough for SPEC2000), the scores for some benchmark
> > programs may be unstable.  [...]
> 
> My box is equipped with 2Gigs of RAM so I believe this is not the case,
> Also the computer was *absolutely* idle when it was running spec2k.
> (booted with init=/bin/sh and no other processes were running).
> 
> And no,
> > [...] acknowledge that I never ran SPEC2000 on AMD machines and some
> > processors generates less "confident intervals". [...]
> this is not the case, I'm absolutely sure.


I am running SPEC on both AMD and Intel machines quite commonly and I
must say that there seems to be difference in between those two.  For P4
and Core I get results within something like 1-2 SPEC point (0.1%) of overall
SPEC score, for Athlon I was never able to get so close, the
difference tends to be up to one percent that is often more than
expected speedup I am looking for.

Of course it might be property of the boxes I have, but there is no
difference in setup of those machines, just it seems to be happening
this way.  Running the tests more times in sequence tends to stabilize
Athlon results, so what I often do is to simply configure peak runs to
do something interesting and use same base runs, since peak scores tends
to be slightly better than base scores even for identical binaries.
(that makes development easier, but not GCC better :)

Honza

Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

Reply via email to