Hi Jerry,
> I am curious about what performance gain results from this? I can see
> saving a library call to our runtime libraries. Do you have some timing
> results?
The speedup can be quite drastic for small matrices which can be
completely unrolled by -O3:
b1.f90:
program main
use b2
implicit none
real, dimension(3,3) :: a, b, c
integer :: i
call random_number(a)
call random_number(b)
do i=1,10**8
c = matmul(a,b)
call bar(b,c)
end do
end program main
b2.f90:
module b2
contains
subroutine bar(b,c)
real, dimension(3,3) :: b,c
end subroutine bar
end module b2
ig25@linux-fd1f:~/Krempel/Matmul> gfortran -O3 -fno-frontend-optimize
b2.f90 b1.f90 && time ./a.out
real 0m15.411s
user 0m15.404s
sys 0m0.001s
ig25@linux-fd1f:~/Krempel/Matmul> gfortran -O3 b2.f90 b1.f90 && time ./a.out
real 0m1.736s
user 0m1.735s
sys 0m0.001s