------- Comment #5 from ubizjak at gmail dot com 2008-08-04 11:43 ------- Hm, following testcase doesn't vectorize due to vect cost model (-O2 -msse3 -ftree-vectorize -ffast-math) on i686 target:
--cut here-- void testf(void) { int i; for (i = 0; i < 16; i++) cf[i] = af[i] + bf[i]; } --cut here-- Compilation reports: pr30211.c:8: note: vectorization_factor = 2, niters = 16 pr30211.c:8: note: === vect_update_slp_costs_according_to_vf === pr30211.c:8: note: cost model: vector iteration cost = 16 is divisible by scalar iteration cost = 8 by a factor greater than or equal to the vectorization factor = 2 . pr30211.c:8: note: not vectorized: vectorization not profitable. pr30211.c:8: note: not vectorized: vector version will never be profitable. However, without cost model the loop in this testcase compiles to: .L2: movaps bf(%eax), %xmm0 addps af(%eax), %xmm0 movaps %xmm0, cf(%eax) addl $16, %eax cmpl $128, %eax jne .L2 which is IMO faster than equivalent scalar version: .L2: movss bf+4(,%eax,8), %xmm1 addss af+4(,%eax,8), %xmm1 movss bf(,%eax,8), %xmm0 addss af(,%eax,8), %xmm0 movss %xmm0, cf(,%eax,8) movss %xmm1, cf+4(,%eax,8) addl $1, %eax cmpl $16, %eax jne .L2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35252