http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
Summary: -O3 destroys beautifully vectorized code obtained at -O2 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: joost.vandevond...@pci.uzh.ch current trunk generates really fast vectorized code for the following testcase (a 12x12x12 matrix multiply, c=c+a*b, benchmarked with a,b,c in cache) as can be seen from the assembly: > cat compare.f90 SUBROUTINE HARD_NN_12_12_12(C,A,B) REAL(KIND=8), INTENT(INOUT) :: C(12,*) REAL(KIND=8), INTENT(IN) :: B(12,*), A(12,*) INTEGER ::i,j,l DO j=1,12 ; DO i=1,12; DO l=1,12 C(i,j)=C(i,j)+A(i,l)*B(l,j) ENDDO ; ENDDO ; ENDDO END SUBROUTINE HARD_NN_12_12_12 however, this only happens with: gfortran-trunk -O2 -funroll-loops -ftree-vectorize -ffast-math -march=corei7 -msse4.2 compare.f90 while switch -O2 to -O3 causes 'bad' code. gfortran-trunk -O3 -funroll-loops -ftree-vectorize -ffast-math -march=corei7 -msse4.2 compare.f90 with the following tester below -O2 runs in about 4.4s -O3 runs in about 7.0s > cat test_compare.f90 REAL(KIND=8), DIMENSION(12,12) :: A,B,C A=0 ; B=0 ; C=0 DO I=1,10000000 CALL HARD_NN_12_12_12(C,12,A,12,B,12) ENDDO END