------- Comment #3 from jb at gcc dot gnu dot org  2007-04-29 20:19 -------
Hmm, try e.g. the following (with gfortran the fixed sized arrays can use
builtin_memset whereas the allocatable arrays are done with a loop. Save as
.F90 (capital F) to force preprocessing to set the SZ macro.

! Test performance of different memset implementations
! compile with: gfortran -DSZ=36 memset.F90
! to test with array size 36 etc.
program testmemset
  implicit none

  real :: r(SZ)
  complex :: c(SZ)
  integer :: i(SZ)
  real, allocatable :: ra(:)
  complex, allocatable :: ca(:)
  integer, allocatable :: ia(:)

  real :: tr, tra, tc, tca, ti, tia, te
  integer :: n, ni

  n = SZ
  ni = 1000000000/SZ

  write (*,*) 'Array size = ', n, ', Doing ', ni, 'iterations.'

  allocate (ra(n), ca(n), ia(n))

  call cpu_time (tr)
  do n = 2, ni
     r = 0.0
  end do
  call cpu_time (te)
  tr = te - tr

  call cpu_time (tra)
  do n = 2, ni
     ra = 0.0
  end do
  call cpu_time (te)
  tra = te - tra

  call cpu_time (tc)
  do n = 2, ni
     c = (0.0, 0.0)
  end do
  call cpu_time (te)
  tc = te - tc

  call cpu_time (tca)
  do n = 2, ni
     ca = (0.0, 0.0)
  end do
  call cpu_time (te)
  tca = te - tca

  call cpu_time (ti)
  do n = 2, ni
     i = 0
  end do
  call cpu_time (te)
  ti = te - ti

  call cpu_time (tia)
  do n = 2, ni
     ia = 0
  end do
  call cpu_time (te)
  tia = te - tia

  write (*,*) 'real    complex    int'
  write (*,*) '======================'
  write (*,*) 'With memset:'
  write (*,'(3(F15.8))') tr, tc, ti
  write (*,*) 'With loop:'
  write (*,'(3(F15.8))') tra, tca, tia

end program testmemset

On my athlon64, with options "-O3 -funroll-loops -march=athlon64 -mfpmath=sse
-ftree-vectorize -ffast-math -fdefault-real-8" 

 Array size =           36 , Doing     27777777 iterations.
 real    complex    int
 ======================
 With memset:
     1.65610300     1.53209600     0.61603800
 With loop:
     0.78804900     1.78411100     0.52403300


Without -fdefault-real-8, e.g. single precision:

 Array size =           36 , Doing     27777777 iterations.
 real    complex    int
 ======================
 With memset:
     1.12006998     0.92005789     0.61603904
 With loop:
     0.49203002     1.43208909     0.50803185

(Yeah, should be more iterations to get better timings..)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31750

Reply via email to