Re: [Patch, RTL] Eliminate redundant vec_select moves.

Richard Sandiford Thu, 07 Nov 2013 06:48:32 -0800

Tejas Belagod <tbela...@arm.com> writes:
> Richard Sandiford wrote:
>> Tejas Belagod <tbela...@arm.com> writes:
>>> Richard Sandiford wrote:
>>>> Tejas Belagod <tbela...@arm.com> writes:
>>>>> Richard Sandiford wrote:
>>>>>> Tejas Belagod <tbela...@arm.com> writes:
>>>>>>> +  /* This is big-endian-safe because the elements are kept in target
>>>>>>> + memory order.  So, for eg. PARALLEL element value of 2 is the
>>>>>>> same in
>>>>>>> +     either endian-ness.  */
>>>>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>>>> +    {
>>>>>>> +      rtx par = XEXP (src, 1);
>>>>>>> +      int i;
>>>>>>> +
>>>>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>>>>>> +       {
>>>>>>> +         rtx tem = XVECEXP (par, 0, i);
>>>>>>> +         if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>>>> +           return 0;
>>>>>>> +       }
>>>>>>> +      return 1;
>>>>>>> +    }
>>>>>>> +
>>>>>> I think for big endian it needs to be:
>>>>>>
>>>>>>     INTVAL (tem) != i + base
>>>>>>
>>>>>> where base is something like:
>>>>>>
>>>>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN 
>>>>>> (par, 0);
>>>>>>
>>>>>> E.g. a big-endian V4HI looks like:
>>>>>>
>>>>>>     msb          lsb
>>>>>>     0000111122223333
>>>>>>
>>>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>>>
>>>>>>             msb  lsb
>>>>>>             22223333
>>>>> But, in this case we want
>>>>>
>>>>>          msb  lsb
>>>>>          00001111
>>>> It depends on whether the result occupies a full register or not.
>>>> I was thinking of the case where it didn't, but I realise now you were
>>>> thinking of the case where it did.  And yeah, my suggestion doesn't
>>>> cope with that...
>>>>
>>>>> I was under the impression that the const vector parallel for vec_select 
>>>>> represents the element indexes of the array in memory order.
>>>>>
>>>>> Therefore, in bigendian,
>>>>>
>>>>>           msb             lsb
>>>>>           0000 1111 2222 3333
>>>>> element  a[0] a[1] a[2] a[3]
>>>>>
>>>>> and in littleendian
>>>>>
>>>>>           msb             lsb
>>>>>           3333 2222 1111 0000
>>>>> element  a[3] a[2] a[1] a[0]
>>>> Right.  But if an N-bit value is stored in a register, it's assumed to
>>>> occupy the lsb of the register and the N-1 bits above that.  The other
>>>> bits in the register are don't-care.
>>>>
>>>> E.g., leaving vectors to one side, if you have:
>>>>
>>>>    (set (reg:HI N) (truncate:SI (reg:SI N)))
>>>>
>>>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>>>>
>>>>    msb  lsb
>>>>    01234567
>>>>        VVVV
>>>>    xxxx4567
>>>>
>>>> rather than:
>>>>
>>>>    msb  lsb
>>>>    01234567
>>>>    VVVV
>>>>    0123xxxx
>>>>
>>>> for both endiannesses.  The same principle applies to vectors.
>>>> The lsb of the register is always assumed to be significant.
>>>>
>>>> So maybe the original patch was correct for partial-register and
>>>> full-register results on little-endian, but only for full-register
>>>> results on big-endian.
>>> Ah, ok! I think I get it. By eliminating
>>>     set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>>>
>>> using the check INTVAL (tem) != i, I'm essentially making subsequent
>>> operations
>>> use (reg:V2DI n) in DI mode which is a partial register result and this
>>> gives me
>>> the wrong set of lanes in bigendian. So, if I want to use (reg n) in 
>>> partial 
>>> register mode, I have to make sure the correct elements coincide with
>>> the lsb in
>>> big-endian...
>>>
>>> Thanks for your input, I'll apply the offset correction for big-endian you 
>>> suggested. I'll respin the patch.
>> 
>> Thanks.  Just for avoidance of doubt, the result might be a full or
>> partial register, depending on the mode and target.  I was trying to
>> correct myself by agreeing that your original was right and mine was
>> wrong for big-endian if the result is a full register.
>> 
>
> What I had in mind when I implemented this was a partial-reg result, but 
> obviously it was wrong.
>
> Sorry, I'm going to take a step back - I'm trying to figure out what a full 
> register result would look like. Looking at the pattern,
>
>    set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>
> the result is always a mode smaller than the src if the vec_select selects a 
> subset of lanes. Plus if the src and dst are the same reg, because we're 
> re-writing the src reg, wouldn't it always end up being a partial-reg?
> In this case, wouldn't (reg:DI n) always represent
>
>       msb  lsb
>       22223333


The problem is that one reg rtx can span several hard registers.
E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
but it might instead represent two 32-bit registers (nos. 32 and 33).
Obviously the latter's not very likely for vectors this small,
but more likely for larger ones (including on NEON IIRC).

So if we had 2 32-bit registers being treated as a V4HI, it would be:

   <--32--><--33-->
   msb          lsb
   0000111122223333
   VVVVVVVV
   00001111
   msb  lsb
   <--32-->

for big endian and:

   <--33--><--32-->
   msb          lsb
   3333222211110000
           VVVVVVVV
           11110000
           msb  lsb
           <--32-->

for little endian.

Thanks,
Richard

Re: [Patch, RTL] Eliminate redundant vec_select moves.

Reply via email to