http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298

           Summary: -O3 destroys beautifully vectorized code obtained at
                    -O2
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: joost.vandevond...@pci.uzh.ch


current trunk generates really fast vectorized code for the following testcase
(a 12x12x12 matrix multiply, c=c+a*b, benchmarked with a,b,c in cache) as can
be seen from the assembly:

> cat compare.f90
   SUBROUTINE HARD_NN_12_12_12(C,A,B) 
      REAL(KIND=8), INTENT(INOUT) :: C(12,*)
      REAL(KIND=8), INTENT(IN)    :: B(12,*), A(12,*)
      INTEGER ::i,j,l
      DO j=1,12 ; DO i=1,12; DO l=1,12
         C(i,j)=C(i,j)+A(i,l)*B(l,j)
      ENDDO ; ENDDO ; ENDDO
   END SUBROUTINE HARD_NN_12_12_12

however, this only happens with:

gfortran-trunk -O2 -funroll-loops -ftree-vectorize -ffast-math -march=corei7
-msse4.2  compare.f90

while switch -O2 to -O3 causes 'bad' code.

gfortran-trunk -O3 -funroll-loops -ftree-vectorize -ffast-math -march=corei7
-msse4.2  compare.f90

with the following tester below

-O2 runs in about 4.4s
-O3 runs in about 7.0s

> cat test_compare.f90 
      REAL(KIND=8), DIMENSION(12,12) :: A,B,C
      A=0 ; B=0 ; C=0
      DO I=1,10000000
         CALL HARD_NN_12_12_12(C,12,A,12,B,12)
      ENDDO
      END

Reply via email to