Tamar Christina <tamar.christ...@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Wednesday, May 22, 2024 10:48 AM >> To: Tamar Christina <tamar.christ...@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw >> <richard.earns...@arm.com>; Marcus Shawcroft >> <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org >> Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to >> patterns >> >> Tamar Christina <tamar.christ...@arm.com> writes: >> > Hi All, >> > >> > This patch adds new alternatives to the patterns which are affected. The >> > new >> > alternatives with the conditional early clobbers are added before the >> > normal >> > ones in order for LRA to prefer them in the event that we have enough free >> > registers to accommodate them. >> > >> > In case register pressure is too high the normal alternatives will be >> > preferred >> > before a reload is considered as we rather have the tie than a spill. >> > >> > Tests are in the next patch. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> > >> > Ok for master? >> > >> > Thanks, >> > Tamar >> > >> > gcc/ChangeLog: >> > >> > * config/aarch64/aarch64-sve.md (and<mode>3, >> > @aarch64_pred_<optab><mode>_z, *<optab><mode>3_cc, >> > *<optab><mode>3_ptest, aarch64_pred_<nlogical><mode>_z, >> > *<nlogical><mode>3_cc, *<nlogical><mode>3_ptest, >> > aarch64_pred_<logical_nn><mode>_z, *<logical_nn><mode>3_cc, >> > *<logical_nn><mode>3_ptest, @aarch64_pred_cmp<cmp_op><mode>, >> > *cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest, >> > @aarch64_pred_cmp<cmp_op><mode>_wide, >> > *aarch64_pred_cmp<cmp_op><mode>_wide_cc, >> > *aarch64_pred_cmp<cmp_op><mode>_wide_ptest, >> @aarch64_brk<brk_op>, >> > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, >> > @aarch64_brk<brk_op>, *aarch64_brkn_cc, *aarch64_brkn_ptest, >> > *aarch64_brk<brk_op>_cc, *aarch64_brk<brk_op>_ptest, >> > aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest, >> > *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber >> > alternative. >> > * config/aarch64/aarch64-sve2.md >> > (@aarch64_pred_<sve_int_op><mode>): Likewise. >> > >> > --- >> > diff --git a/gcc/config/aarch64/aarch64-sve.md >> > b/gcc/config/aarch64/aarch64- >> sve.md >> > index >> e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c >> 297428c85fe46 100644 >> > --- a/gcc/config/aarch64/aarch64-sve.md >> > +++ b/gcc/config/aarch64/aarch64-sve.md >> > @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z" >> > (reg:VNx16BI FFRT_REGNUM) >> > (match_operand:VNx16BI 1 "register_operand")))] >> > "TARGET_SVE && TARGET_NON_STREAMING" >> > - {@ [ cons: =0, 1 ] >> > - [ Upa , Upa ] rdffr\t%0.b, %1/z >> > + {@ [ cons: =0, 1 ; attrs: pred_clobber ] >> > + [ &Upa , Upa; yes ] rdffr\t%0.b, %1/z >> > + [ ?Upa , Upa; yes ] ^ >> > + [ Upa , Upa; * ] ^ >> > } >> > ) >> >> Sorry for not explaining it very well, but in the previous review I >> suggested: >> >> > The gather-like approach would be something like: >> > >> > [ &Upa , Upl , w , <sve_imm_con>; yes ] >> cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 >> > [ ?Upl , 0 , w , <sve_imm_con>; yes ] ^ >> > [ Upa , Upl , w , <sve_imm_con>; no ] ^ >> > [ &Upa , Upl , w , w ; yes ] >> > cmp<cmp_op>\t%0.<Vetype>, %1/z, >> %3.<Vetype>, %4.<Vetype> >> > [ ?Upl , 0 , w , w ; yes ] ^ >> > [ Upa , Upl , w , w ; no ] ^ >> > >> > with: >> > >> > (define_attr "pred_clobber" "any,no,yes" (const_string "any")) >> >> (with emphasis on the last line). What I didn't say explicitly is >> that "no" should require !TARGET_SVE_PRED_CLOBBER. >> >> The premise of that review was that we shouldn't enable things like: >> >> [ Upa , Upl , w , w ; no ] ^ >> >> for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber >> alternative. So we should enable either the pred_clobber=yes >> alternatives or the pred_clobber=no alternatives, but not both. >> >> The default "any" is then for other non-predicate instructions that >> don't care about TARGET_SVE_PRED_CLOBBER either way. >> >> In contrast, this patch makes pred_clobber=yes enable the alternatives >> that correctly describe the restriction (good!) but then also enables >> the normal alternatives too, which IMO makes the semantics unclear. > > Sure, the reason I still had that is because this ICEs under high register > pressure: > > {@ [ cons: =0 , 1 , 3 , 4 ; attrs: pred_clobber ] > [ &Upa , Upl , w , <sve_imm_con>; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, #%4 > [ ?Upa , 0 , w , <sve_imm_con>; yes ] ^ > [ Upa , Upl , w , <sve_imm_con>; no ] ^ > [ &Upa , Upl , w , w ; yes ] > cmp<cmp_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype> > [ ?Upa , 0 , w , w ; yes ] ^ > [ Upa , Upl , w , w ; no ] ^ > } > > So now in the `yes` case reload does: > > Considering alt=0 of insn 10: (0) =&Upa (1) Upl (3) w (4) vsd > 0 Small class reload: reject+=3 > 0 Non input pseudo reload: reject++ > 0 Early clobber: reject++ > Bad operand -- refuse > Considering alt=1 of insn 10: (0) ?Upa (1) 0 (3) w (4) vsd > Staticly defined alt reject+=6 > 0 Small class reload: reject+=3 > 0 Non input pseudo reload: reject++ > 1 Dying matched operand reload: reject++ > 1 Small class reload: reject+=3 > Bad operand -- refuse > Considering alt=3 of insn 10: (0) &Upa (1) Upl (3) w (4) w > 0 Small class reload: reject+=3 > 0 Non input pseudo reload: reject++ > 0 Early clobber: reject++ > overall=11,losers=1,rld_nregs=1 > Considering alt=4 of insn 10: (0) ?Upa (1) 0 (3) w (4) w > Staticly defined alt reject+=6 > 0 Small class reload: reject+=3 > 0 Non input pseudo reload: reject++ > overall=16,losers=1 -- refuse > Choosing alt 3 in insn 10: (0) &Upa (1) Upl (3) w (4) w > {aarch64_pred_cmplovnx8hi} > > And the penalty of alt=4 makes it pick alt=3 even though it doesn't have the > free registers > for it. alt=4 would have worked.
By "high register pressure", do you mean if predicate registers are disabled using -ffixed? If so, that's ok in itself. That kind of ICE shouldn't occur in real use. > I believe this now follows exactly what was suggested: > > 1. provide an early clobber alternative > 2. provide an explicit tie alternative with an increase in cost for using it > 3. provide a general/normal alternative that is only enabled when the first > two aren't. > > Having read the email a number of times.. did I somehow miss something? But how is (3) arranged? It looks like the normal alternative is enabled unconditionally, in the sense that the "enabled" attribute is always "yes". Thanks, Richard