DOT_PRODUCT

adam at aphirst dot karoo.co.uk Mon, 06 Mar 2017 17:33:07 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930


--- Comment #7 from Adam Hirst <adam at aphirst dot karoo.co.uk> ---
OK, I tried a little harder, and was able to get a performance increase.

  type(Vect3D) pure function TP_LEFT(NU, D, NV) result(tensorproduct)
    real(dp),     intent(in) :: NU(4), NV(4)
    type(Vect3D), intent(in) :: D(4,4)
    real(dp)                 :: Dx(4,4), Dy(4,4), Dz(4,4), NUDx(4), NUDy(4),
NUDz(4)

    Dx = D%x
    Dy = D%y
    Dz = D%z
    NUDx = matmul(NU, Dx)
    NUDy = matmul(NU, Dy)
    NUDz = matmul(NU, Dz)
    tensorproduct%x = dot_product(NUDx,NV)
    tensorproduct%y = dot_product(NUDy,NV)
    tensorproduct%z = dot_product(NUDz,NV)
  end function

The result of this (still using -Ofast) is that the matmul path sped up by a
factor of about 6 (on my machine), which would have placed it now faster than
the "explicit DO" approach, but that too gained a huge reduction under -Ofast,
so the result is that matmul here is about half as fast as the explicit loop.

But here is where things get really interesting. If also use -flto on this
post's matmul codepath, I get the result that the matmul implementation is
twice as fast as the (already now VERY fast) DO-implementation. This huge boost
doesn't seem to apply to the version of TP_LEFT from my previous post, nor to
the original TP_LEFT from the initial ticket submission.

In conclusion: It seems that your remark about matmul inlining also applies to
dot_product.

NOTE: For the -flto tests, gcc is clever enough to realise that we're not
actually using these results, so I have to save tp(1:i_max) and have the user
specify an element to print, in order to force the computation. I of course put
those "outside" each pair of cpu_time calls.

As an aside, I also tried the effect of -fexpensive-optimizations but it did
more or less nothing.

---

By the way, are there any thoughts yet on the random number calls taking
/longer/ once optimisations are enabled? If I'm reading my results right, -flto
seems to "fix" that, but it doesn't seem obvious that it should be occurring in
the first place.

[Bug fortran/79930] Potentially Missed Optimisation for MATMUL / DOT_PRODUCT

Reply via email to