On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Hongtao Liu <crazy...@gmail.com> writes: > > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Tamar Christina <tamar.christ...@arm.com> writes: > >> >> -----Original Message----- > >> >> From: Hongtao Liu <crazy...@gmail.com> > >> >> Sent: Tuesday, November 15, 2022 9:37 AM > >> >> To: Tamar Christina <tamar.christ...@arm.com> > >> >> Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina via > >> >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; > >> >> rguent...@suse.de > >> >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors > >> >> from > >> >> arbitrary element position inside a vector > >> >> > >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina > >> >> <tamar.christ...@arm.com> wrote: > >> >> > > >> >> > > -----Original Message----- > >> >> > > From: Hongtao Liu <crazy...@gmail.com> > >> >> > > Sent: Tuesday, November 15, 2022 8:36 AM > >> >> > > To: Tamar Christina <tamar.christ...@arm.com> > >> >> > > Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina > >> >> > > via Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; > >> >> > > rguent...@suse.de > >> >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of > >> >> > > subvectors from arbitrary element position inside a vector > >> >> > > > >> >> > > Hi: > >> >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- > >> >> > > November/606040.html. > >> >> > > > } > >> >> > > > > >> >> > > > /* See if we can get a better vector mode before extracting. > >> >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index > >> >> > > > > >> >> > > > >> >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 > >> >> > > 9 > >> >> > > 0 > >> >> > > > a453cc6a28d9 100644 > >> >> > > > --- a/gcc/optabs.cc > >> >> > > > +++ b/gcc/optabs.cc > >> >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode > >> >> mode, > >> >> > > rtx v0, rtx v1, > >> >> > > > v0_qi = gen_lowpart (qimode, v0); > >> >> > > > v1_qi = gen_lowpart (qimode, v1); > >> >> > > > if (targetm.vectorize.vec_perm_const != NULL > >> >> > > > + && targetm.can_change_mode_class (mode, qimode, > >> >> > > > + ALL_REGS) > >> >> > > It looks like you want to guard gen_lowpart, shouldn't it be better > >> >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, > >> >> target_qi)). > >> >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not > >> >> > > to guard gen_lowpart. > >> >> > > >> >> > Hmm I don't think this is quite true, there are existing usages in > >> >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I > >> >> > mentioned before for instance the canoncalization of vec_select to > >> >> > subreg > >> >> in rtlanal for instances uses this. > >> >> In theory, we need to iterate through all reg classes that can be > >> >> assigned for > >> >> both qimode and mode, if any regclass returns true for > >> >> targetm.can_change_mode_class, the bitcast(validate_subreg) should be > >> >> ok. > >> >> Here we just passed ALL_REGS. > >> > > >> > Yes, and most targets where this transformation is valid return true > >> > here. > >> > > >> > I've checked: > >> > * alpha > >> > * arm > >> > * aarch64 > >> > * rs6000 > >> > * s390 > >> > * sparc > >> > * pa > >> > * mips > >> > > >> > And even the default example that other targets use from the > >> > documentation > >> > would return true as the size of the modes are the same. > >> > > >> > X86 and RISCV are the only two targets that I found (but didn't check > >> > all) that > >> > blankly return a result based on just the register classes. > >> > > >> > That is to say, there are more targets that adhere to the interpretation > >> > that > >> > rclass here means "should be possible in some class in rclass" rather > >> > than > >> > "should be possible in ALL classes of rclass". > >> > >> Yeah, I agree. A query "can something stored in ALL_REGS change from > >> mode M1 to mode M2?" is meaningful if at least one register R in ALL_REGS > >> can hold both M1 and M2. It's then the target's job to answer > >> conservatively so that the result covers all such R. > >> > >> In principle it's OK for a target to err on the side of caution and forbid > >> things that are actually OK. But that's going to risk losing performance > >> in some cases, and sometimes that loss of performance will be unacceptable. > >> IMO that's what's happening here. The target is applying x87 rules to > >> things that (AIUI) are never stored in x87 registers, and so losing > > Yes, it can be optimized since some mode will never assigned to x87 > > registers. > >> performance as a result. > >> > >> Note that the RA also uses ALL_REGS for some things, so this usage > >> isn't specific to non-RA code. > > RA passes the minimal reg class(REGNO_REG_CLASS) which contains REGN > > to decide if can_change_mode_class, not ALL_REGS. > > 511/* Given a hard REGN a FROM mode and a TO mode, return true if > > 512 REGN can change from mode FROM to mode TO. */ > > 513#define REG_CAN_CHANGE_MODE_P(REGN, FROM, TO) \ > > 514 (targetm.can_change_mode_class (FROM, TO, REGNO_REG_CLASS (REGN))) > > 515 > > > > So I still think using can_change_mode_class outside of RA with > > ALL_REGS passed to decide whether it's ok to generate subreg is not a > > good idea. > > But if the argument is that the only valid uses of can_change_mode_class > are through this macro, the hook isn't describing a class property, > it's describing the property of individual registers. If we say that > querying individual registers is the only valid thing to do them > we should change the hook to take a register number rather than > a class enum. > > The reason we have a class-based rather than register-based interface > is because it is useful to query classes before you've picked a > specific register. For individual registers in the minimal reg class, we assume they are not different from each other, I guess that's why we have REGNO_REG_CLASS and class-based interfaces other than register-based interfaces. But for ALL_REGS, it's not the minimal reg class, it's the largest. Using it It's not that suitable. If the argument is if some r in rclass is ok for mode change, the hook would return true, then why would RA use REGNO_REG_CLASS other than ALL_REGS. Another spot is in validate_subreg, we're using the minimal reg class instead of ALL_REGS. 973 /* This is a normal subreg. Verify that the offset is representable. */ 974 975 /* For hard registers, we already have most of these rules collected in 976 subreg_offset_representable_p. */ 977 if (reg && REG_P (reg) && HARD_REGISTER_P (reg)) 978 { 979 unsigned int regno = REGNO (reg); 980 981 if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode)) 982 && GET_MODE_INNER (imode) == omode) 983 ; 984 else if (!REG_CAN_CHANGE_MODE_P (regno, imode, omode)) 985 return false;
I think we do need some hook in the middle end to query things like if some r in rclass is ok for mode change? but not reusing can_change_mode_class. > > Thanks, > Richard > > >> IMO it's not the job of target-independent code to iterate through > >> individual classes and aggregate the result. One of the reasons for > >> having union classes is to avoid the need to do that. And ALL_REGS > >> is the ultimate union class. :-) > >> > >> The patch looks correct to me. > >> > >> Thanks, > >> Richard > >> > >> >> > > >> >> > So there are already existing precedence for this. And the > >> >> > documentation for the hook says: > >> >> > > >> >> > "This hook returns true if it is possible to bitcast values held in > >> >> > registers of > >> >> class rclass from mode from to mode to and if doing so preserves the > >> >> low- > >> >> order bits that are common to both modes. The result is only meaningful > >> >> if > >> >> rclass has registers that can hold both from and to. The default > >> >> implementation returns true" > >> >> > > >> >> > So it looks like it's use outside of RA is perfectly valid.. and the > >> >> > documentation also mentions in the example the use from the mid-end as > >> >> an example. > >> >> > > >> >> > But if the mid-end maintainers are happy I'll use something else. > >> >> > > >> >> > Tamar > >> >> > > >> >> > > I did similar things in > >> >> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html > >> >> > > (and ALL_REGS doesn't cover all cases for registers which are both > >> >> > > available for qimode and mode, ALL_REGS fail doesn't mean it can't > >> >> > > be subreg, it just means parts of ALL_REGS can't be subreg. but with > >> >> > > a subset of ALL_REGS, there could be a reg class which return true > >> >> > > for > >> >> > > targetm.can_change_mode_class) > >> >> > > > && targetm.vectorize.vec_perm_const (qimode, qimode, > >> >> > > > target_qi, > >> >> > > v0_qi, > >> >> > > > v1_qi, > >> >> > > > qimode_indices)) > >> >> > > > return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8 > >> >> > > > @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, > >> >> > > > } > >> >> > > > > >> >> > > > if (qimode != VOIDmode > >> >> > > > - && selector_fits_mode_p (qimode, qimode_indices)) > >> >> > > > + && selector_fits_mode_p (qimode, qimode_indices) > >> >> > > > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)) > >> >> > > > { > >> >> > > > icode = direct_optab_handler (vec_perm_optab, qimode); > >> >> > > > if (icode != CODE_FOR_nothing) diff --git > >> >> > > > a/gcc/testsuite/gcc.target/aarch64/ext_1.c > >> >> > > > b/gcc/testsuite/gcc.target/aarch64/ext_1.c > >> >> > > > new file mode 100644 > >> >> > > > index > >> >> > > > > >> >> > > > >> >> 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e5 > >> >> > > 71 > >> >> > > > b3bc2ddf887a > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > -- > >> >> > > BR, > >> >> > > Hongtao > >> >> > >> >> > >> >> > >> >> -- > >> >> BR, > >> >> Hongtao -- BR, Hongtao