On Fri, Oct 28, 2011 at 06:30:35PM +0200, Mikael Morin wrote: > On Friday 28 October 2011 15:56:36 Jack Howarth wrote: > > Mikael, > > The complete patch bootstraps current FSF gcc trunk on > > x86_64-apple-darwin11 and the resulting gfortran compiler can compile the > > Polyhedron 2005 benchmarks using... > > > > Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto > > -fwhole-program %n.f90 -o %n > > > > without runtime regressions. However I don't seem to see any particular > > performance improvements with your patches applied. In fact, a few > > benchmarks including nf and test_fpu seem to show slower runtimes > > (~8-11%). Have you done any benchmarking with and without the proposed > > patches? Jack > > Not myself, but the previous versions of the patch have been reported to give > sensitive improvement on "tonto" here: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c26 > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43829#c35 > > Since those versions, the array constructor handling has been improved, and a > few mostly cosmetic changes have been applied, so I expect the posted patch > to > be on par with the previous ones, possibly slightly better. > > Now regarding your regressions, it is quite a lot worse, and quite unexpected. > I have just looked at test_fpu.f90 and nf.f90 from a polyhedron source I have > found at http://www.polyhedron.com/web_images/documents/pb05.zip. > There is no call to product in them, and both use only single-argument sum > calls, which are not (or shouldn't be) impacted by my patch (scalar cases). > Indeed, if I compare the code produced using -fdump-tree-original, there is > zero difference in nf.f90, and in test_fpu.f90 only slight variations which > are very very unlikely to cause the regression you see (see attached diff). > > Could you double check your figures, and/or that the regressions are really > caused by my patch?
Mikeal, The problem was the quick.par testing with the patch applied. Full standard.par testing suggests that identical binaries are produced for pb05 (by size anyway)... Using built-in specs. COLLECT_GCC=gcc-fsf-4.7 COLLECT_LTO_WRAPPER=/sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.2.0/4.7.0/lto-wrapper Target: x86_64-apple-darwin11.2.0 Configured with: ../gcc-4.7-20111028/configure --prefix=/sw --prefix=/sw/lib/gcc4.7 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.7/info --with-build-config=bootstrap-lto --enable-stage1-languages=c,lto --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.7 --enable-checking=yes --enable-cloog-backend=isl Thread model: posix gcc version 4.7.0 20111028 (experimental) (GCC) prepatch at r180613 Date & Time : 28 Oct 2011 13:47:42 Test Name : gfortran_lin_O3_wholeprogram Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto -fwhole-program %n.f90 -o %n Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct linpk mdbx nf protein rnflow test_fpu tfft Maximum Times : 2000.0 Target Error % : 0.100 Minimum Repeats : 10 Maximum Repeats : 100 Benchmark Compile Executable Ave Run Number Estim Name (secs) (bytes) (secs) Repeats Err % --------- ------- ---------- ------- ------- ------ ac 6.75 55000 8.16 10 0.0522 aermod 119.95 1237720 16.83 13 0.0956 air 18.38 106960 5.77 33 0.0949 capacita 6.48 77240 32.61 17 0.0903 channel 2.21 34904 2.05 19 0.0493 doduc 20.19 196496 25.98 17 0.0978 fatigue 7.20 81616 5.98 16 0.0998 gas_dyn 13.58 119824 4.11 44 0.0854 induct 12.90 145096 12.86 13 0.0936 linpk 1.90 26104 15.51 22 0.0667 mdbx 6.52 81104 11.32 23 0.0995 nf 6.66 71872 27.17 38 0.0891 protein 21.47 127264 31.24 15 0.0726 rnflow 19.51 131056 24.42 19 0.0776 test_fpu 12.09 97272 7.89 22 0.0399 tfft 1.63 22464 1.87 21 0.0169 Geometric Mean Execution Time = 10.54 seconds postpatch at r180613 Date & Time : 28 Oct 2011 16:42:27 Test Name : gfortran_lin_O3_wholeprogram Compile Command : gfortran-fsf-4.7 -O3 -ffast-math -funroll-loops -flto -fwhole-program %n.f90 -o %n Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct linpk mdbx nf protein rnflow test_fpu tfft Maximum Times : 2000.0 Target Error % : 0.100 Minimum Repeats : 10 Maximum Repeats : 100 Benchmark Compile Executable Ave Run Number Estim Name (secs) (bytes) (secs) Repeats Err % --------- ------- ---------- ------- ------- ------ ac 6.44 55000 8.16 10 0.0304 aermod 120.51 1237720 16.88 14 0.0968 air 19.54 106960 5.78 16 0.0774 capacita 6.40 77240 32.58 22 0.0796 channel 2.16 34904 2.05 43 0.0893 doduc 22.76 196496 25.61 18 0.0407 fatigue 6.99 81616 5.99 16 0.0852 gas_dyn 12.92 119824 4.08 28 0.0866 induct 14.28 145096 12.85 12 0.0829 linpk 1.97 26104 15.50 14 0.0722 mdbx 6.52 81104 11.12 20 0.0151 nf 6.44 71872 27.51 39 0.0935 protein 20.86 127264 31.21 12 0.0603 rnflow 20.45 131056 24.40 14 0.0828 test_fpu 12.10 97272 7.89 24 0.0780 tfft 1.63 22464 1.87 18 0.0878 Geometric Mean Execution Time = 10.53 seconds > > Mikael > --- test_fpu.f90.003t.original.master 2011-10-28 18:08:53.000000000 +0200 > +++ test_fpu.f90.003t.original.patched 2011-10-28 18:22:28.000000000 > +0200 > @@ -1929,6 +1929,7 @@ > D.2297 = offset.65 + -1; > atmp.64.dim[0].ubound = D.2297; > pos.61 = D.2297 >= 0 ? 1 : 0; > + offset.62 = 1; > { > integer(kind=8) S.67; > > @@ -1936,7 +1937,6 @@ > while (1) > { > if (S.67 > D.2297) goto L.133; > - offset.62 = 1; > if (ABS_EXPR <(*(real(kind=8)[0] * restrict) > atmp.64.data)[S.67]> > limit.63) > { > limit.63 = ABS_EXPR <(*(real(kind=8)[0] * > restrict) atmp.64.data)[S.67]>; > @@ -2406,14 +2406,14 @@ > integer(kind=8) D.2457; > integer(kind=8) S.104; > > - D.2457 = D.2436 + D.2442; > - D.2458 = stride.45; > + D.2457 = stride.45; > + D.2458 = D.2436 + D.2442; > D.2459 = D.2443 * stride.45 + D.2439; > S.104 = 0; > while (1) > { > if (S.104 > D.2444) goto L.149; > - (*(real(kind=8)[0:] * restrict) > atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2458 + D.2457]; > + (*(real(kind=8)[0:] * restrict) > atmp.103.data)[S.104] = (*b)[(S.104 + D.2454) * D.2457 + D.2458]; > S.104 = S.104 + 1; > } > L.149:; > @@ -2486,13 +2486,13 @@ > integer(kind=8) D.2479; > integer(kind=8) S.106; > > - D.2479 = D.2473 + D.2476; > - D.2480 = stride.45; > + D.2479 = stride.45; > + D.2480 = D.2473 + D.2476; > S.106 = D.2471; > while (1) > { > if (S.106 > D.2472) goto L.152; > - (*b)[(S.106 + D.2477) * D.2480 + D.2479] = > (*temp)[S.106 + -1]; > + (*b)[(S.106 + D.2477) * D.2479 + D.2480] = > (*temp)[S.106 + -1]; > S.106 = S.106 + 1; > } > L.152:; > @@ -2756,13 +2756,13 @@ > integer(kind=8) D.2549; > integer(kind=8) S.112; > > - D.2549 = D.2543 + D.2546; > - D.2550 = stride.45; > + D.2549 = stride.45; > + D.2550 = D.2543 + D.2546; > S.112 = 1; > while (1) > { > if (S.112 > D.2542) goto L.168; > - (*b)[(S.112 + D.2547) * D.2550 + D.2549] = > (*temp)[S.112 + -1]; > + (*b)[(S.112 + D.2547) * D.2549 + D.2550] = > (*temp)[S.112 + -1]; > S.112 = S.112 + 1; > } > L.168:; > @@ -2885,13 +2885,13 @@ > integer(kind=8) D.2582; > integer(kind=8) S.115; > > - D.2582 = D.2575 + D.2579; > - D.2583 = stride.45; > + D.2582 = stride.45; > + D.2583 = D.2575 + D.2579; > S.115 = 1; > while (1) > { > if (S.115 > D.2578) goto L.176; > - (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * > D.2583 + D.2582]; > + (*temp)[S.115 + -1] = (*b)[(S.115 + D.2580) * > D.2582 + D.2583]; > S.115 = S.115 + 1; > } > L.176:; > @@ -3348,6 +3348,7 @@ > D.2733 = (integer(kind=8)) *n; > D.2734 = (integer(kind=8)) k; > pos.146 = D.2732 <= D.2733 ? 1 : 0; > + offset.147 = 1 - D.2732; > { > integer(kind=8) D.2736; > integer(kind=8) S.149; > @@ -3357,7 +3358,6 @@ > while (1) > { > if (S.149 > D.2733) goto L.191; > - offset.147 = 1 - D.2732; > if (ABS_EXPR <(*b)[S.149 + D.2736]> > limit.148) > { > limit.148 = ABS_EXPR <(*b)[S.149 + D.2736]>;