On 21 July 2011 15:19, Ira Rosen <ira.ro...@linaro.org> wrote:
> On 20 July 2011 21:35, Ulrich Weigand <uweig...@de.ibm.com> wrote:
>>
>> The return value of foo with vectorization is 1249 instead
>> of 1999 for some reason.
>
> I reproduced the failure. It occurs without Richard's
> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
> patches too. Obviously the vectorized loop is executed, but at the
> moment I don't understand why. I'll have a better look on Sunday.

Actually it doesn't choose the vectorized code. But the scalar version
gets optimized in a harmful way for SPU, AFAIU.
Here is the scalar loop after vrp2

<bb 8>:
  # ivtmp.42_50 = PHI <ivtmp.42_59(3), ivtmp.42_45(10)>
  D.4593_42 = (void *) ivtmp.53_32;
  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
  D.4521_34 = D.4520_33 + 1;
  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
  ivtmp.42_45 = ivtmp.42_50 + 4;
  if (ivtmp.42_45 != 16)
    goto <bb 10>;
  else
    goto <bb 5>;

and the load is changed by dom2 to:

<bb 4>:
  ...
  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
   ...

where vector(4) int * vect_pa.9;

And the scalar loop has no rotate for that load:

.L3:
        lqd     $13,0($2)
        lqx     $11,$5,$3
        cwx     $7,$sp,$3
        ai      $12,$13,1
        shufb   $6,$12,$11,$7
        stqx    $6,$5,$3
        ai      $3,$3,4
        ceqi    $4,$3,16


I manually added rotqby for $13 and the result was correct (I changed
the test to iterate only 4 times to make the things easier).

Ira

>
> Ira
>
>>
>> Bye,
>> Ulrich
>>
>> --
>>  Dr. Ulrich Weigand
>>  GNU Toolchain for Linux on System z and Cell BE
>>  ulrich.weig...@de.ibm.com
>>
>

Reply via email to