Tejas Belagod <tbela...@arm.com> writes: > Richard Sandiford wrote: >> Tejas Belagod <tbela...@arm.com> writes: >>> Richard Sandiford wrote: >>>> Tejas Belagod <tbela...@arm.com> writes: >>>>> Richard Sandiford wrote: >>>>>> Tejas Belagod <tbela...@arm.com> writes: >>>>>>> + /* This is big-endian-safe because the elements are kept in target >>>>>>> + memory order. So, for eg. PARALLEL element value of 2 is the >>>>>>> same in >>>>>>> + either endian-ness. */ >>>>>>> + if (GET_CODE (src) == VEC_SELECT >>>>>>> + && REG_P (XEXP (src, 0)) && REG_P (dst) >>>>>>> + && REGNO (XEXP (src, 0)) == REGNO (dst)) >>>>>>> + { >>>>>>> + rtx par = XEXP (src, 1); >>>>>>> + int i; >>>>>>> + >>>>>>> + for (i = 0; i < XVECLEN (par, 0); i++) >>>>>>> + { >>>>>>> + rtx tem = XVECEXP (par, 0, i); >>>>>>> + if (!CONST_INT_P (tem) || INTVAL (tem) != i) >>>>>>> + return 0; >>>>>>> + } >>>>>>> + return 1; >>>>>>> + } >>>>>>> + >>>>>> I think for big endian it needs to be: >>>>>> >>>>>> INTVAL (tem) != i + base >>>>>> >>>>>> where base is something like: >>>>>> >>>>>> int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN >>>>>> (par, 0); >>>>>> >>>>>> E.g. a big-endian V4HI looks like: >>>>>> >>>>>> msb lsb >>>>>> 0000111122223333 >>>>>> >>>>>> and shortening it to say V2HI only gives the low 32 bits: >>>>>> >>>>>> msb lsb >>>>>> 22223333 >>>>> But, in this case we want >>>>> >>>>> msb lsb >>>>> 00001111 >>>> It depends on whether the result occupies a full register or not. >>>> I was thinking of the case where it didn't, but I realise now you were >>>> thinking of the case where it did. And yeah, my suggestion doesn't >>>> cope with that... >>>> >>>>> I was under the impression that the const vector parallel for vec_select >>>>> represents the element indexes of the array in memory order. >>>>> >>>>> Therefore, in bigendian, >>>>> >>>>> msb lsb >>>>> 0000 1111 2222 3333 >>>>> element a[0] a[1] a[2] a[3] >>>>> >>>>> and in littleendian >>>>> >>>>> msb lsb >>>>> 3333 2222 1111 0000 >>>>> element a[3] a[2] a[1] a[0] >>>> Right. But if an N-bit value is stored in a register, it's assumed to >>>> occupy the lsb of the register and the N-1 bits above that. The other >>>> bits in the register are don't-care. >>>> >>>> E.g., leaving vectors to one side, if you have: >>>> >>>> (set (reg:HI N) (truncate:SI (reg:SI N))) >>>> >>>> on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this: >>>> >>>> msb lsb >>>> 01234567 >>>> VVVV >>>> xxxx4567 >>>> >>>> rather than: >>>> >>>> msb lsb >>>> 01234567 >>>> VVVV >>>> 0123xxxx >>>> >>>> for both endiannesses. The same principle applies to vectors. >>>> The lsb of the register is always assumed to be significant. >>>> >>>> So maybe the original patch was correct for partial-register and >>>> full-register results on little-endian, but only for full-register >>>> results on big-endian. >>> Ah, ok! I think I get it. By eliminating >>> set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0])))) >>> >>> using the check INTVAL (tem) != i, I'm essentially making subsequent >>> operations >>> use (reg:V2DI n) in DI mode which is a partial register result and this >>> gives me >>> the wrong set of lanes in bigendian. So, if I want to use (reg n) in >>> partial >>> register mode, I have to make sure the correct elements coincide with >>> the lsb in >>> big-endian... >>> >>> Thanks for your input, I'll apply the offset correction for big-endian you >>> suggested. I'll respin the patch. >> >> Thanks. Just for avoidance of doubt, the result might be a full or >> partial register, depending on the mode and target. I was trying to >> correct myself by agreeing that your original was right and mine was >> wrong for big-endian if the result is a full register. >> > > What I had in mind when I implemented this was a partial-reg result, but > obviously it was wrong. > > Sorry, I'm going to take a step back - I'm trying to figure out what a full > register result would look like. Looking at the pattern, > > set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0])))) > > the result is always a mode smaller than the src if the vec_select selects a > subset of lanes. Plus if the src and dst are the same reg, because we're > re-writing the src reg, wouldn't it always end up being a partial-reg? > In this case, wouldn't (reg:DI n) always represent > > msb lsb > 22223333
The problem is that one reg rtx can span several hard registers. E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32), but it might instead represent two 32-bit registers (nos. 32 and 33). Obviously the latter's not very likely for vectors this small, but more likely for larger ones (including on NEON IIRC). So if we had 2 32-bit registers being treated as a V4HI, it would be: <--32--><--33--> msb lsb 0000111122223333 VVVVVVVV 00001111 msb lsb <--32--> for big endian and: <--33--><--32--> msb lsb 3333222211110000 VVVVVVVV 11110000 msb lsb <--32--> for little endian. Thanks, Richard