Tejas Belagod <tbela...@arm.com> writes: > Richard Sandiford wrote: >> Tejas Belagod <tbela...@arm.com> writes: >>> Richard Sandiford wrote: >>>> Tejas Belagod <tbela...@arm.com> writes: >>>>> + /* This is big-endian-safe because the elements are kept in target >>>>> + memory order. So, for eg. PARALLEL element value of 2 is the same >>>>> in >>>>> + either endian-ness. */ >>>>> + if (GET_CODE (src) == VEC_SELECT >>>>> + && REG_P (XEXP (src, 0)) && REG_P (dst) >>>>> + && REGNO (XEXP (src, 0)) == REGNO (dst)) >>>>> + { >>>>> + rtx par = XEXP (src, 1); >>>>> + int i; >>>>> + >>>>> + for (i = 0; i < XVECLEN (par, 0); i++) >>>>> + { >>>>> + rtx tem = XVECEXP (par, 0, i); >>>>> + if (!CONST_INT_P (tem) || INTVAL (tem) != i) >>>>> + return 0; >>>>> + } >>>>> + return 1; >>>>> + } >>>>> + >>>> I think for big endian it needs to be: >>>> >>>> INTVAL (tem) != i + base >>>> >>>> where base is something like: >>>> >>>> int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, >>>> 0); >>>> >>>> E.g. a big-endian V4HI looks like: >>>> >>>> msb lsb >>>> 0000111122223333 >>>> >>>> and shortening it to say V2HI only gives the low 32 bits: >>>> >>>> msb lsb >>>> 22223333 >>> But, in this case we want >>> >>> msb lsb >>> 00001111 >> >> It depends on whether the result occupies a full register or not. >> I was thinking of the case where it didn't, but I realise now you were >> thinking of the case where it did. And yeah, my suggestion doesn't >> cope with that... >> >>> I was under the impression that the const vector parallel for vec_select >>> represents the element indexes of the array in memory order. >>> >>> Therefore, in bigendian, >>> >>> msb lsb >>> 0000 1111 2222 3333 >>> element a[0] a[1] a[2] a[3] >>> >>> and in littleendian >>> >>> msb lsb >>> 3333 2222 1111 0000 >>> element a[3] a[2] a[1] a[0] >> >> Right. But if an N-bit value is stored in a register, it's assumed to >> occupy the lsb of the register and the N-1 bits above that. The other >> bits in the register are don't-care. >> >> E.g., leaving vectors to one side, if you have: >> >> (set (reg:HI N) (truncate:SI (reg:SI N))) >> >> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this: >> >> msb lsb >> 01234567 >> VVVV >> xxxx4567 >> >> rather than: >> >> msb lsb >> 01234567 >> VVVV >> 0123xxxx >> >> for both endiannesses. The same principle applies to vectors. >> The lsb of the register is always assumed to be significant. >> >> So maybe the original patch was correct for partial-register and >> full-register results on little-endian, but only for full-register >> results on big-endian. > > Ah, ok! I think I get it. By eliminating > set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0])))) > > using the check INTVAL (tem) != i, I'm essentially making subsequent > operations > use (reg:V2DI n) in DI mode which is a partial register result and this > gives me > the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial > register mode, I have to make sure the correct elements coincide with > the lsb in > big-endian... > > Thanks for your input, I'll apply the offset correction for big-endian you > suggested. I'll respin the patch.
Thanks. Just for avoidance of doubt, the result might be a full or partial register, depending on the mode and target. I was trying to correct myself by agreeing that your original was right and mine was wrong for big-endian if the result is a full register. I don't know if there are existing helper functions for this kind of thing. Richard