On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.ro...@linaro.org> wrote: > On 21 July 2011 15:19, Ira Rosen <ira.ro...@linaro.org> wrote: >> On 20 July 2011 21:35, Ulrich Weigand <uweig...@de.ibm.com> wrote: >>> >>> The return value of foo with vectorization is 1249 instead >>> of 1999 for some reason. >> >> I reproduced the failure. It occurs without Richard's >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this >> patches too. Obviously the vectorized loop is executed, but at the >> moment I don't understand why. I'll have a better look on Sunday. > > Actually it doesn't choose the vectorized code. But the scalar version > gets optimized in a harmful way for SPU, AFAIU. > Here is the scalar loop after vrp2 > > <bb 8>: > # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)> > D.4593_42 = (void *) ivtmp.53_32; > D.4520_33 = MEM[base: D.4593_42, offset: 0B]; > D.4521_34 = D.4520_33 + 1; > MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34; > ivtmp.42_45 = ivtmp.42_50 + 4; > if (ivtmp.42_45 != 16) > goto <bb 10>; > else > goto <bb 5>; > > and the load is changed by dom2 to: > > <bb 4>: > ... > D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B]; > ... > > where vector(4) int * vect_pa.9; > > And the scalar loop has no rotate for that load:
Hum. This smells like we are hiding sth from the tree optimizers? > .L3: > lqd $13,0($2) > lqx $11,$5,$3 > cwx $7,$sp,$3 > ai $12,$13,1 > shufb $6,$12,$11,$7 > stqx $6,$5,$3 > ai $3,$3,4 > ceqi $4,$3,16 > > > I manually added rotqby for $13 and the result was correct (I changed > the test to iterate only 4 times to make the things easier). > > Ira > >> >> Ira >> >>> >>> Bye, >>> Ulrich >>> >>> -- >>> Dr. Ulrich Weigand >>> GNU Toolchain for Linux on System z and Cell BE >>> ulrich.weig...@de.ibm.com >>> >> >