I've been doing some compiler benchmarks with AMD K3-450 and CURRENT
with GCC 2.95.2.  If anyone is interested in the results, here they
are.  I'm doing performance profiling for our future 4.0 rollout.

The K6 processor is dead hardware, but I still use them for clustered
machines in a couple web server farms.  The 2368K (64k L1, 256K L2, 
2048K L3) of cache RAM seems to make these machines very fast for their
class.

FreeBSD 4.0 & GCC 2.95.2 / Nbench 2.1

CFLAGS                                  Memory  Integer Floating-Point

A) -s -O                                2.177   1.630   1.771
B) -s -Os                               1.803   1.651   1.807
C) -s -O2                               2.099   1.682   2.122
D) -s -O3                               2.141   1.852   1.979
E) -s -O6                               2.144   1.850   1.981
F) E + -static                          2.191   1.917   2.258
G) F + -ffast-math                      2.134   1.850   2.269
H) G + -fexpensive-optimizations        2.133   1.888   2.72
I) H + -funroll-loops                   2.385   2.083   2.309
J) I + -mcpu=k6 -march=k6               2.385   2.076   2.247
K) I + -malign-functions=4              2.462   2.075   2.293
L) K + -malign-loops=2 -malign-jumps=2  2.459   2.077   2.293
M) K + -malign-loops=4 -malign-jumps=4  2.376   2.070   2.299
N) K + -fschedule-insns2                2.461   2.075   2.292
O) K + -mwide-multiply                  2.461   2.074   2.293
P) K + -malign-double                   2.427   2.089   2.287
Q) K + -mpreferred-stack-boundry=2      2.430   2.090   2.311
R) Q + -fno-caller-saves                2.365   2.038   2.335

Higher numbers are of course better.  I had the best overall benchmark
using:

-06 -static -ffast-math -fexpensive-optimizations -funroll-loops
-malign-functions=4 -mpreferred-stack-boundry=2 -s

I am very suprised that adding -mcpu=k6 and march=k6 actually lowered
performance significantly.

For floating point math, it looks like -O2, -ffast-math, and
-fno-caller-saves are usefull.

Not bad geting about 100mhz worth of performance gains from the optimal
CFLAGS.  Out of curiosity, I compiled a system and kernel with these
CFLAGS, minus -static.  Running well so far, and now into second make
world -J4.

Doing similar compiler bechmarks for our Athlon systems next week.  When
I use Pentium II machines, I have found -Os works best because of their
small L1 cache.

-
Natha Kinsman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to