Tejas Belagod <tbela...@arm.com> writes:
> Richard Sandiford wrote:
>> Tejas Belagod <tbela...@arm.com> writes:
>>> Richard Sandiford wrote:
>>>> Tejas Belagod <tbela...@arm.com> writes:
>>>>> +  /* This is big-endian-safe because the elements are kept in target
>>>>> +     memory order.  So, for eg. PARALLEL element value of 2 is the same 
>>>>> in
>>>>> +     either endian-ness.  */
>>>>> +  if (GET_CODE (src) == VEC_SELECT
>>>>> +      && REG_P (XEXP (src, 0)) && REG_P (dst)
>>>>> +      && REGNO (XEXP (src, 0)) == REGNO (dst))
>>>>> +    {
>>>>> +      rtx par = XEXP (src, 1);
>>>>> +      int i;
>>>>> +
>>>>> +      for (i = 0; i < XVECLEN (par, 0); i++)
>>>>> + {
>>>>> +   rtx tem = XVECEXP (par, 0, i);
>>>>> +   if (!CONST_INT_P (tem) || INTVAL (tem) != i)
>>>>> +     return 0;
>>>>> + }
>>>>> +      return 1;
>>>>> +    }
>>>>> +
>>>> I think for big endian it needs to be:
>>>>
>>>>     INTVAL (tem) != i + base
>>>>
>>>> where base is something like:
>>>>
>>>>     int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 
>>>> 0);
>>>>
>>>> E.g. a big-endian V4HI looks like:
>>>>
>>>>     msb          lsb
>>>>     0000111122223333
>>>>
>>>> and shortening it to say V2HI only gives the low 32 bits:
>>>>
>>>>             msb  lsb
>>>>             22223333
>>> But, in this case we want
>>>
>>>          msb  lsb
>>>          00001111
>> 
>> It depends on whether the result occupies a full register or not.
>> I was thinking of the case where it didn't, but I realise now you were
>> thinking of the case where it did.  And yeah, my suggestion doesn't
>> cope with that...
>> 
>>> I was under the impression that the const vector parallel for vec_select 
>>> represents the element indexes of the array in memory order.
>>>
>>> Therefore, in bigendian,
>>>
>>>           msb             lsb
>>>           0000 1111 2222 3333
>>> element  a[0] a[1] a[2] a[3]
>>>
>>> and in littleendian
>>>
>>>           msb             lsb
>>>           3333 2222 1111 0000
>>> element  a[3] a[2] a[1] a[0]
>> 
>> Right.  But if an N-bit value is stored in a register, it's assumed to
>> occupy the lsb of the register and the N-1 bits above that.  The other
>> bits in the register are don't-care.
>> 
>> E.g., leaving vectors to one side, if you have:
>> 
>>    (set (reg:HI N) (truncate:SI (reg:SI N)))
>> 
>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
>> 
>>    msb  lsb
>>    01234567
>>        VVVV
>>    xxxx4567
>> 
>> rather than:
>> 
>>    msb  lsb
>>    01234567
>>    VVVV
>>    0123xxxx
>> 
>> for both endiannesses.  The same principle applies to vectors.
>> The lsb of the register is always assumed to be significant.
>> 
>> So maybe the original patch was correct for partial-register and
>> full-register results on little-endian, but only for full-register
>> results on big-endian.
>
> Ah, ok! I think I get it. By eliminating
>       set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
>
> using the check INTVAL (tem) != i, I'm essentially making subsequent 
> operations 
> use (reg:V2DI n) in DI mode which is a partial register result and this
> gives me
> the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial 
> register mode, I have to make sure the correct elements coincide with
> the lsb in
> big-endian...
>
> Thanks for your input, I'll apply the offset correction for big-endian you 
> suggested. I'll respin the patch.

Thanks.  Just for avoidance of doubt, the result might be a full or
partial register, depending on the mode and target.  I was trying to
correct myself by agreeing that your original was right and mine was
wrong for big-endian if the result is a full register.

I don't know if there are existing helper functions for this kind of thing.

Richard

Reply via email to