------- Comment #11 from jv244 at cam dot ac dot uk 2008-08-19 06:09 ------- Created an attachment (id=16095) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16095&action=view) new testcase
This (PR31079_11.f90) should be a replacement for comment #4, and illustrates the vectorizer issue. > gfortran -O3 -ftree-vectorize -ffast-math -march=native PR31079_11.f90 > ./a.out 4.0282512 > ifort -O3 -xT PR31079_11.f90 PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED. PR31079_11.f90(52): (col. 13) remark: BLOCK WAS VECTORIZED. PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED. PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED. PR31079_11.f90(17): (col. 8) remark: LOOP WAS VECTORIZED. PR31079_11.f90(24): (col. 5) remark: BLOCK WAS VECTORIZED. PR31079_11.f90(30): (col. 7) remark: LOOP WAS VECTORIZED. PR31079_11.f90(31): (col. 7) remark: LOOP WAS VECTORIZED. > ./a.out 2.640165 The inner loop looks like: DO i=1,N s(1:2)=s(1:2)+pxy(i)%a(:)*dpy(i)%a(1) s(3:4)=s(3:4)+pxy(i)%a(:)*dpy(i)%a(2) ENDDO which ifort vectorizes (I will attach the full asm): ..B3.4: # Preds ..B3.4 ..B3.3 movddup collocate_core_2_2_0_0_$DPY.0.1(%rax), %xmm2 #30.33 movddup 8+collocate_core_2_2_0_0_$DPY.0.1(%rax), %xmm4 #31.33 movaps collocate_core_2_2_0_0_$PXY.0.1(%rax), %xmm3 #30.7 mulpd %xmm3, %xmm2 #30.32 incq %rdx #29.5 addq $16, %rax #29.5 addpd %xmm2, %xmm1 #30.7 cmpq $1000, %rdx #29.5 mulpd %xmm3, %xmm4 #31.32 addpd %xmm4, %xmm0 #31.7 jl ..B3.4 # Prob 99% #29.5 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079