>  With the compiler from the ira branch on x86_64-linux, here are the
>  timings reported by "gfortran -c -time -save-temps" with and without
>  IRA (two timings provided for each set of option, to check
>  reproducibility)

OK, I come back with fresh numbers from the current IRA branch, rev.
135035, which I believe includes the fix for -O0 compilation time
(thanks, by the way!). I'm still compiling the same huge testcase
(from CP2K), which is a good example of relatively heavy use of
Fortran 95 features. Memory used during compilation was up to 3 GB
when optimization is turned on (this is a 8GB system, and I checked
that disk swap didn't come into play). This is on x86_64-linux.


At -O0: 3% decrease wrt current, no further effect for -fira-algorithm=CB
At -O0 -g: 3% decrease wrt current, slightly smaller (-1.5%) with
-fira-algorithm=CB
At -O1: 7% increase wrt current; -fira-algorithm=CB turns this into
only a 2% increase
At -O2: 5% increase for -fira; only 1.5% increase when
-fira-algorithm=CB is used
At -O2 -ffast-math, -O3 and -O3 -ffast-math: roughly same as -O2, 3%
to 5% increase for -fira, down to a 1%-2% increase when
-fira-algorithm=CB is used.
With -funroll-loops, -ftree-vectorize or both: again, roughly the same.

I've also tried gfortran's -fbounds-check option, which increases a
lot the amount of code emitted by the front-end for a given source,
and haven't seen any significant different from the results reported
above (in particular, no performance degradation).

I've also played with -m32 at various optimization levels, and the
results are again in the same range as above for -m64.


*Conclusions*

All in all, the -O0 performance is now on par with the old allocator,
and at higher optimisation levels, we see a 3% to 5% regression. The
CB algorithm is faster, with a regression of only 1.5% to 2%.

I'll now turn to benchmarking of generated code (I'll run the
Polyhedron benchmark, which is widely known and referred to in the
Fortran community). I don't have the guts to do a systematic check of
memory consumption of the compiler, but I think it'd be nice if
someone could do that.

FX


PS: I attach the file containing all timings. For each set of option,
I ran the compiler twice; when timings differ significantly, that's
because of other users using the machine (which is a rather underused
dual-core biprocessor, with an average load during my tests of 1.09),
and I thus take the smallest number for calculations.

-- 
FX Coudert
http://www.homepages.ucl.ac.uk/~uccafco/
-O0
# f951 135.59 6.88
# f951 135.91 9.86

-O0 -fira
# f951 131.26 6.41
# f951 131.19 6.49

-O0 -fira -fira-algorithm=CB
# f951 131.20 6.76
# f951 130.84 6.80

--------------

-O1
# f951 477.87 14.74
# f951 478.26 14.46

-O1 -fira
# f951 511.43 14.69
# f951 510.64 13.56

-O1 -fira -fira-algorithm=CB
# f951 488.57 14.45
# f951 488.54 13.67

------------------------------

-O2 
# f951 670.03 16.17
# f951 669.36 14.80
-O2 -fira
# f951 701.83 14.23
# f951 703.29 15.17
-O2 -fira -fira-algorithm=CB
# f951 682.19 15.01
# f951 678.86 15.06

------------------------------

-O2 -ffast-math 
# f951 675.44 16.60
# f951 673.41 16.63
-O2 -ffast-math -fira
# f951 706.19 14.39
# f951 706.00 13.76
-O2 -ffast-math -fira -fira-algorithm=CB
# f951 688.10 14.68
# f951 736.99 18.26

------------------------------

-O3 
# f951 844.27 15.13
# f951 845.93 14.35
-O3 -fira
# f951 872.07 16.54
# f951 873.54 13.72
-O3 -fira -fira-algorithm=CB
# f951 854.09 14.85
# f951 847.93 16.90

------------------------------

-O3 -ffast-math 
# f951 846.92 14.47
# f951 846.12 16.58
-O3 -ffast-math -fira
# f951 877.64 14.22
# f951 883.09 13.62
-O3 -ffast-math -fira -fira-algorithm=CB
# f951 865.35 13.44
# f951 891.76 16.52

------------------------------

-O3 -ffast-math -funroll-loops 
# f951 1112.40 15.43
# f951 1091.32 15.83
-O3 -ffast-math -funroll-loops -fira
# f951 1123.51 13.97
# f951 1126.89 15.50
-O3 -ffast-math -funroll-loops -fira -fira-algorithm=CB
# f951 1106.21 15.21
# f951 1108.12 15.91

------------------------------

-O3 -ffast-math -funroll-loops -ftree-vectorize 
# f951 1093.59 14.93
# f951 1092.91 15.98
-O3 -ffast-math -funroll-loops -ftree-vectorize -fira
# f951 1149.13 15.80
# f951 1134.78 14.84
-O3 -ffast-math -funroll-loops -ftree-vectorize -fira -fira-algorithm=CB
# f951 1107.87 14.71
# f951 1092.80 13.97

------------------------------

-O0 -m32 
# f951 133.29 6.63
# f951 133.38 6.97
-O0 -m32 -fira
# f951 132.86 7.68
# f951 134.41 7.03
-O0 -m32 -fira -fira-algorithm=CB
# f951 133.95 6.98
# f951 132.94 5.96

------------------------------

-O2 -m32 
# f951 654.35 14.56
# f951 652.43 13.62
-O2 -m32 -fira
# f951 675.74 14.10
# f951 686.01 13.97
-O2 -m32 -fira -fira-algorithm=CB
# f951 659.19 14.44
# f951 666.36 14.48

------------------------------

-O3 -ffast-math -funroll-loops -ftree-vectorize -m32 
# f951 974.28 15.45
# f951 1024.43 15.94
-O3 -ffast-math -funroll-loops -ftree-vectorize -m32 -fira
# f951 1028.05 13.84
# f951 1029.07 14.03
-O3 -ffast-math -funroll-loops -ftree-vectorize -m32 -fira -fira-algorithm=CB
# f951 1002.64 13.56
# f951 1001.72 13.28

--------------

-O0 -g
# f951 141.44 9.69
# f951 142.12 8.85

-O0 -g -fira
# f951 137.47 8.40
# f951 137.23 7.14

-O0 -g -fira -fira-algorithm=CB
# f951 140.24 8.64
# f951 140.53 7.48

--------------

-O0 -fbounds-check
# f951 323.18 11.43
# f951 322.77 11.13

-O0 -fbounds-check -fira
# f951 326.24 10.21
# f951 319.46 8.86

-O0 -fbounds-check -fira -fira-algorithm=CB
# f951 325.73 11.95
# f951 323.79 8.37

Reply via email to