On 21 July 2011 15:19, Ira Rosen <[email protected]> wrote:
> On 20 July 2011 21:35, Ulrich Weigand <[email protected]> wrote:
>>
>> The return value of foo with vectorization is 1249 instead
>> of 1999 for some reason.
>
> I reproduced the failure. It occurs without Richard's
> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
> patches too. Obviously the vectorized loop is executed, but at the
> moment I don't understand why. I'll have a better look on Sunday.
Actually it doesn't choose the vectorized code. But the scalar version
gets optimized in a harmful way for SPU, AFAIU.
Here is the scalar loop after vrp2
<bb 8>:
# ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
D.4593_42 = (void *) ivtmp.53_32;
D.4520_33 = MEM[base: D.4593_42, offset: 0B];
D.4521_34 = D.4520_33 + 1;
MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
ivtmp.42_45 = ivtmp.42_50 + 4;
if (ivtmp.42_45 != 16)
goto <bb 10>;
else
goto <bb 5>;
and the load is changed by dom2 to:
<bb 4>:
...
D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
...
where vector(4) int * vect_pa.9;
And the scalar loop has no rotate for that load:
.L3:
lqd $13,0($2)
lqx $11,$5,$3
cwx $7,$sp,$3
ai $12,$13,1
shufb $6,$12,$11,$7
stqx $6,$5,$3
ai $3,$3,4
ceqi $4,$3,16
I manually added rotqby for $13 and the result was correct (I changed
the test to iterate only 4 times to make the things easier).
Ira
>
> Ira
>
>>
>> Bye,
>> Ulrich
>>
>> --
>> Dr. Ulrich Weigand
>> GNU Toolchain for Linux on System z and Cell BE
>> [email protected]
>>
>