> With the compiler from the ira branch on x86_64-linux, here are the > timings reported by "gfortran -c -time -save-temps" with and without > IRA (two timings provided for each set of option, to check > reproducibility)
OK, I come back with fresh numbers from the current IRA branch, rev. 135035, which I believe includes the fix for -O0 compilation time (thanks, by the way!). I'm still compiling the same huge testcase (from CP2K), which is a good example of relatively heavy use of Fortran 95 features. Memory used during compilation was up to 3 GB when optimization is turned on (this is a 8GB system, and I checked that disk swap didn't come into play). This is on x86_64-linux. At -O0: 3% decrease wrt current, no further effect for -fira-algorithm=CB At -O0 -g: 3% decrease wrt current, slightly smaller (-1.5%) with -fira-algorithm=CB At -O1: 7% increase wrt current; -fira-algorithm=CB turns this into only a 2% increase At -O2: 5% increase for -fira; only 1.5% increase when -fira-algorithm=CB is used At -O2 -ffast-math, -O3 and -O3 -ffast-math: roughly same as -O2, 3% to 5% increase for -fira, down to a 1%-2% increase when -fira-algorithm=CB is used. With -funroll-loops, -ftree-vectorize or both: again, roughly the same. I've also tried gfortran's -fbounds-check option, which increases a lot the amount of code emitted by the front-end for a given source, and haven't seen any significant different from the results reported above (in particular, no performance degradation). I've also played with -m32 at various optimization levels, and the results are again in the same range as above for -m64. *Conclusions* All in all, the -O0 performance is now on par with the old allocator, and at higher optimisation levels, we see a 3% to 5% regression. The CB algorithm is faster, with a regression of only 1.5% to 2%. I'll now turn to benchmarking of generated code (I'll run the Polyhedron benchmark, which is widely known and referred to in the Fortran community). I don't have the guts to do a systematic check of memory consumption of the compiler, but I think it'd be nice if someone could do that. FX PS: I attach the file containing all timings. For each set of option, I ran the compiler twice; when timings differ significantly, that's because of other users using the machine (which is a rather underused dual-core biprocessor, with an average load during my tests of 1.09), and I thus take the smallest number for calculations. -- FX Coudert http://www.homepages.ucl.ac.uk/~uccafco/
-O0 # f951 135.59 6.88 # f951 135.91 9.86 -O0 -fira # f951 131.26 6.41 # f951 131.19 6.49 -O0 -fira -fira-algorithm=CB # f951 131.20 6.76 # f951 130.84 6.80 -------------- -O1 # f951 477.87 14.74 # f951 478.26 14.46 -O1 -fira # f951 511.43 14.69 # f951 510.64 13.56 -O1 -fira -fira-algorithm=CB # f951 488.57 14.45 # f951 488.54 13.67 ------------------------------ -O2 # f951 670.03 16.17 # f951 669.36 14.80 -O2 -fira # f951 701.83 14.23 # f951 703.29 15.17 -O2 -fira -fira-algorithm=CB # f951 682.19 15.01 # f951 678.86 15.06 ------------------------------ -O2 -ffast-math # f951 675.44 16.60 # f951 673.41 16.63 -O2 -ffast-math -fira # f951 706.19 14.39 # f951 706.00 13.76 -O2 -ffast-math -fira -fira-algorithm=CB # f951 688.10 14.68 # f951 736.99 18.26 ------------------------------ -O3 # f951 844.27 15.13 # f951 845.93 14.35 -O3 -fira # f951 872.07 16.54 # f951 873.54 13.72 -O3 -fira -fira-algorithm=CB # f951 854.09 14.85 # f951 847.93 16.90 ------------------------------ -O3 -ffast-math # f951 846.92 14.47 # f951 846.12 16.58 -O3 -ffast-math -fira # f951 877.64 14.22 # f951 883.09 13.62 -O3 -ffast-math -fira -fira-algorithm=CB # f951 865.35 13.44 # f951 891.76 16.52 ------------------------------ -O3 -ffast-math -funroll-loops # f951 1112.40 15.43 # f951 1091.32 15.83 -O3 -ffast-math -funroll-loops -fira # f951 1123.51 13.97 # f951 1126.89 15.50 -O3 -ffast-math -funroll-loops -fira -fira-algorithm=CB # f951 1106.21 15.21 # f951 1108.12 15.91 ------------------------------ -O3 -ffast-math -funroll-loops -ftree-vectorize # f951 1093.59 14.93 # f951 1092.91 15.98 -O3 -ffast-math -funroll-loops -ftree-vectorize -fira # f951 1149.13 15.80 # f951 1134.78 14.84 -O3 -ffast-math -funroll-loops -ftree-vectorize -fira -fira-algorithm=CB # f951 1107.87 14.71 # f951 1092.80 13.97 ------------------------------ -O0 -m32 # f951 133.29 6.63 # f951 133.38 6.97 -O0 -m32 -fira # f951 132.86 7.68 # f951 134.41 7.03 -O0 -m32 -fira -fira-algorithm=CB # f951 133.95 6.98 # f951 132.94 5.96 ------------------------------ -O2 -m32 # f951 654.35 14.56 # f951 652.43 13.62 -O2 -m32 -fira # f951 675.74 14.10 # f951 686.01 13.97 -O2 -m32 -fira -fira-algorithm=CB # f951 659.19 14.44 # f951 666.36 14.48 ------------------------------ -O3 -ffast-math -funroll-loops -ftree-vectorize -m32 # f951 974.28 15.45 # f951 1024.43 15.94 -O3 -ffast-math -funroll-loops -ftree-vectorize -m32 -fira # f951 1028.05 13.84 # f951 1029.07 14.03 -O3 -ffast-math -funroll-loops -ftree-vectorize -m32 -fira -fira-algorithm=CB # f951 1002.64 13.56 # f951 1001.72 13.28 -------------- -O0 -g # f951 141.44 9.69 # f951 142.12 8.85 -O0 -g -fira # f951 137.47 8.40 # f951 137.23 7.14 -O0 -g -fira -fira-algorithm=CB # f951 140.24 8.64 # f951 140.53 7.48 -------------- -O0 -fbounds-check # f951 323.18 11.43 # f951 322.77 11.13 -O0 -fbounds-check -fira # f951 326.24 10.21 # f951 319.46 8.86 -O0 -fbounds-check -fira -fira-algorithm=CB # f951 325.73 11.95 # f951 323.79 8.37