On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen <ira.ro...@linaro.org> wrote:
> On 21 July 2011 15:19, Ira Rosen <ira.ro...@linaro.org> wrote:
>> On 20 July 2011 21:35, Ulrich Weigand <uweig...@de.ibm.com> wrote:
>>>
>>> The return value of foo with vectorization is 1249 instead
>>> of 1999 for some reason.
>>
>> I reproduced the failure. It occurs without Richard's
>> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>> patches too. Obviously the vectorized loop is executed, but at the
>> moment I don't understand why. I'll have a better look on Sunday.
>
> Actually it doesn't choose the vectorized code. But the scalar version
> gets optimized in a harmful way for SPU, AFAIU.
> Here is the scalar loop after vrp2
>
> <bb 8>:
>  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
>  D.4593_42 = (void *) ivtmp.53_32;
>  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>  D.4521_34 = D.4520_33 + 1;
>  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>  ivtmp.42_45 = ivtmp.42_50 + 4;
>  if (ivtmp.42_45 != 16)
>    goto <bb 10>;
>  else
>    goto <bb 5>;
>
> and the load is changed by dom2 to:
>
> <bb 4>:
>  ...
>  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>   ...
>
> where vector(4) int * vect_pa.9;
>
> And the scalar loop has no rotate for that load:

Hum.  This smells like we are hiding sth from the tree optimizers?

> .L3:
>        lqd     $13,0($2)
>        lqx     $11,$5,$3
>        cwx     $7,$sp,$3
>        ai      $12,$13,1
>        shufb   $6,$12,$11,$7
>        stqx    $6,$5,$3
>        ai      $3,$3,4
>        ceqi    $4,$3,16
>
>
> I manually added rotqby for $13 and the result was correct (I changed
> the test to iterate only 4 times to make the things easier).
>
> Ira
>
>>
>> Ira
>>
>>>
>>> Bye,
>>> Ulrich
>>>
>>> --
>>>  Dr. Ulrich Weigand
>>>  GNU Toolchain for Linux on System z and Cell BE
>>>  ulrich.weig...@de.ibm.com
>>>
>>
>

Reply via email to