On 21 July 2011 15:19, Ira Rosen <ira.ro...@linaro.org> wrote: > On 20 July 2011 21:35, Ulrich Weigand <uweig...@de.ibm.com> wrote: >> >> The return value of foo with vectorization is 1249 instead >> of 1999 for some reason. > > I reproduced the failure. It occurs without Richard's > (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this > patches too. Obviously the vectorized loop is executed, but at the > moment I don't understand why. I'll have a better look on Sunday.
Actually it doesn't choose the vectorized code. But the scalar version gets optimized in a harmful way for SPU, AFAIU. Here is the scalar loop after vrp2 <bb 8>: # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> D.4593_42 = (void *) ivtmp.53_32; D.4520_33 = MEM[base: D.4593_42, offset: 0B]; D.4521_34 = D.4520_33 + 1; MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; ivtmp.42_45 = ivtmp.42_50 + 4; if (ivtmp.42_45 != 16) goto <bb 10>; else goto <bb 5>; and the load is changed by dom2 to: <bb 4>: ... D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; ... where vector(4) int * vect_pa.9; And the scalar loop has no rotate for that load: .L3: lqd $13,0($2) lqx $11,$5,$3 cwx $7,$sp,$3 ai $12,$13,1 shufb $6,$12,$11,$7 stqx $6,$5,$3 ai $3,$3,4 ceqi $4,$3,16 I manually added rotqby for $13 and the result was correct (I changed the test to iterate only 4 times to make the things easier). Ira > > Ira > >> >> Bye, >> Ulrich >> >> -- >> Dr. Ulrich Weigand >> GNU Toolchain for Linux on System z and Cell BE >> ulrich.weig...@de.ibm.com >> >