------- Comment #15 from changpeng dot fang at amd dot com 2010-07-01 00:34 ------- Unrolling of the peeled loop is partially the reason for test_fpu.f90 compilation time and code size increase. Vectorization peeled a few iteration of the the loop, the prefetching and unrolling passes does not recognize that a loop is a peeled version and still unroll the loop.
MODULE kinds INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300) END MODULE kinds ! -------------------------------------------------------------------- PROGRAM TEST_FPU ! A number-crunching benchmark using matrix inversion. USE kinds ! Implemented by: David Frank dave_fr...@hotmail.com IMPLICIT NONE ! Gauss routine by: Tim Prince n...@aol.com ! Crout routine by: James Van Buskirk tor...@ix.netcom.com ! Lapack routine by: Jos Bergervoet berge...@iaehv.nl REAL(RK8) :: pool(101, 101,1000), a(101, 101) INTEGER :: i DO i = 1,1000 a = pool(:,:,i) ! get next matrix to invert END DO END PROGRAM TEST_FPU In this example, prefetching will unroll tree version of the innermost loop. If we turn off the vectorizer, it unrolls the only loop. In addition, -fprefetch-loop-arrays and -funroll-loops (turned on at the same time) will unroll the same loop. This is over-unrolling and -funroll-loops should recognize that the loop has already been unrolled by prefetching. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576