------- Comment #15 from changpeng dot fang at amd dot com  2010-07-01 00:34 
-------
Unrolling of the peeled loop is partially the reason for test_fpu.f90
compilation
time and code size increase. Vectorization peeled a few iteration of the the
loop, the prefetching and unrolling passes does not recognize that a loop is a
peeled version and still unroll the loop.

 MODULE kinds
   INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300)
END MODULE kinds
! --------------------------------------------------------------------
PROGRAM TEST_FPU  ! A number-crunching benchmark using matrix inversion.
USE kinds         ! Implemented by:    David Frank  dave_fr...@hotmail.com
IMPLICIT NONE     ! Gauss  routine by: Tim Prince   n...@aol.com
                  ! Crout  routine by: James Van Buskirk  tor...@ix.netcom.com
                  ! Lapack routine by: Jos Bergervoet berge...@iaehv.nl

REAL(RK8) :: pool(101, 101,1000), a(101, 101)
INTEGER :: i

      DO i = 1,1000
         a = pool(:,:,i)         ! get next matrix to invert
      END DO

END PROGRAM TEST_FPU


In this example, prefetching will unroll tree version of the innermost loop.
If we turn off the vectorizer, it unrolls the only loop.

In addition, -fprefetch-loop-arrays and -funroll-loops (turned on at the same
time) will unroll the same loop. This is over-unrolling and  -funroll-loops
should recognize that the loop has already been unrolled by prefetching.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576

Reply via email to