https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #33 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
With #pragma GCC optimize ( "-O3" )

$ gfc -static -O2 -finline-matmul-limit=0 compare.f90 
$ ./a.out 
 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.055      0.051      0.038      0.056
    4  2000      0.408      0.274      0.318      0.408
    8  2000      0.644      0.711      1.287      1.831
   16  2000      2.507      2.591      2.521      2.579
   32  2000      3.573      2.300      3.506      3.573
   64  2000      4.628      2.196      4.462      4.629
  128  2000      5.030      2.393      5.304      5.054
  256   477      4.802      2.367      5.573      4.854
  512    59      3.907      1.856      5.234      4.035
 1024     7      3.891      1.178      5.222      4.022
 2048     1      3.901      1.500      5.238      4.033

and with no #pragma it is better than the -O3 version

$ gfc -static -O2 -finline-matmul-limit=0 compare.f90 
$ ./a.out 
 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.054      0.052      0.043      0.057
    4  2000      0.397      0.281      0.316      0.414
    8  2000      0.691      0.773      1.831      1.995
   16  2000      2.493      2.691      2.521      2.512
   32  2000      3.629      2.301      3.623      3.572
   64  2000      4.557      2.072      4.568      4.468
  128  2000      5.282      2.387      5.291      5.284
  256   477      5.629      2.369      5.620      5.605
  512    59      5.215      1.874      5.240      5.216
 1024     7      5.212      1.174      5.217      5.217
 2048     1      5.230      1.499      5.234      5.229

Still a good improvement over gfortran6 on the larger matrices.

Reply via email to