On 11/13/2016 11:03 PM, Thomas Koenig wrote:
Hi Jerry,

I think this

+      /* Parameter adjustments */
+      c_dim1 = m;
+      c_offset = 1 + c_dim1;

should be

+      /* Parameter adjustments */
+      c_dim1 = rystride;
+      c_offset = 1 + c_dim1;

Regarding options for matmul:  It is possible to add the
options to the lines in Makefile.in

# Turn on vectorization and loop unrolling for matmul.
$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
-funroll-loops

This is a great step forward.  I think we can close most matmul-related
PRs once this patch has been applied.

Regards

    Thomas


With Thomas suggestion, I can remove the #pragma optimize from the source code. Doing this: (long lines wrapped as shown)

diff --git a/libgfortran/Makefile.am b/libgfortran/Makefile.am
index 39d3e11..9ee17f9 100644
--- a/libgfortran/Makefile.am
+++ b/libgfortran/Makefile.am
@@ -850,7 +850,7 @@ intrinsics/dprod_r8.f90 \
 intrinsics/f2c_specifics.F90

 # Turn on vectorization and loop unrolling for matmul.
-$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize -funroll-loops +$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ffast-math -fno-protect-parens -fstack-arrays -ftree-vectorize -funroll-loops --param max-unroll-times=4 -ftree-loop-vectorize
 # Logical matmul doesn't vectorize.
 $(patsubst %.c,%.lo,$(notdir $(i_matmull_c))): AM_CFLAGS += -funroll-loops


Comparing gfortran 6 vs 7: (test program posted in PR51119)

$ gfc6 -static -Ofast -finline-matmul-limit=32 -funroll-loops --param max-unroll-times=4 compare.f90
$ ./a.out
 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000     11.928      0.047      0.082      0.138
    4  2000      1.455      0.220      0.371      0.316
    8  2000      1.476      0.737      0.704      1.574
   16  2000      4.536      3.755      2.825      3.820
   32  2000      6.070      5.443      3.124      5.158
   64  2000      5.423      5.355      5.405      5.413
  128  2000      5.913      5.841      5.917      5.917
  256   477      5.865      5.252      5.863      5.862
  512    59      2.794      2.841      2.794      2.791
 1024     7      1.662      1.356      1.662      1.661
 2048     1      1.753      1.724      1.753      1.754

$ gfc -static -Ofast -finline-matmul-limit=32 -funroll-loops --param max-unroll-times=4 compare.f90
$ ./a.out
 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000     12.146      0.042      0.090      0.146
    4  2000      1.496      0.232      0.384      0.325
    8  2000      2.330      0.765      0.763      0.965
   16  2000      4.611      4.120      2.792      3.830
   32  2000      6.068      5.265      3.102      4.859
   64  2000      6.527      5.329      6.425      6.495
  128  2000      8.207      5.643      8.336      8.441
  256   477      9.210      4.967      9.367      9.299
  512    59      8.330      2.772      8.422      8.342
 1024     7      8.430      1.378      8.511      8.424
 2048     1      8.339      1.718      8.425      8.322

I do think we need to adjust the default inline limit and should do this separately from this patch.

With these changes, OK for trunk?

Regards,

Jerry

Reply via email to