> Am 30.09.2022 um 16:29 schrieb Richard Sandiford via Gcc-patches > <gcc-patches@gcc.gnu.org>: > > Tamar Christina <tamar.christ...@arm.com> writes: >>> -----Original Message----- >>> From: Richard Biener <rguent...@suse.de> >>> Sent: Friday, September 30, 2022 12:53 PM >>> To: Tamar Christina <tamar.christ...@arm.com> >>> Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina via >>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Jeff Law >>> <jeffreya...@gmail.com> >>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional >>> branches, give hint if argument is a truth type to backend >>> >>>> On Fri, 30 Sep 2022, Tamar Christina wrote: >>> >>>>> -----Original Message----- >>>>> From: Richard Biener <rguent...@suse.de> >>>>> Sent: Friday, September 30, 2022 11:17 AM >>>>> To: Tamar Christina <tamar.christ...@arm.com> >>>>> Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina >>>>> via Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Jeff >>> Law >>>>> <jeffreya...@gmail.com> >>>>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional >>>>> branches, give hint if argument is a truth type to backend >>>>> >>>>> On Fri, 30 Sep 2022, Tamar Christina wrote: >>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Richard Sandiford <richard.sandif...@arm.com> >>>>>>> Sent: Friday, September 30, 2022 9:49 AM >>>>>>> To: Tamar Christina <tamar.christ...@arm.com> >>>>>>> Cc: Richard Biener <rguent...@suse.de>; Tamar Christina via >>>>>>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Jeff >>> Law >>>>>>> <jeffreya...@gmail.com> >>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of >>>>>>> conditional branches, give hint if argument is a truth type to >>>>>>> backend >>>>>>> >>>>>>> Tamar Christina <tamar.christ...@arm.com> writes: >>>>>>>>> -----Original Message----- >>>>>>>>> From: Richard Sandiford <richard.sandif...@arm.com> >>>>>>>>> Sent: Friday, September 30, 2022 9:29 AM >>>>>>>>> To: Tamar Christina <tamar.christ...@arm.com> >>>>>>>>> Cc: Richard Biener <rguent...@suse.de>; Tamar Christina via >>>>>>>>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Jeff >>>>> Law >>>>>>>>> <jeffreya...@gmail.com> >>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of >>>>>>>>> conditional branches, give hint if argument is a truth type >>>>>>>>> to backend >>>>>>>>> >>>>>>>>> Tamar Christina <tamar.christ...@arm.com> writes: >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Gcc-patches <gcc-patches- >>>>>>>>>>> bounces+tamar.christina=arm....@gcc.gnu.org> On Behalf Of >>>>>>> Richard >>>>>>>>>>> Biener via Gcc-patches >>>>>>>>>>> Sent: Thursday, September 29, 2022 12:09 PM >>>>>>>>>>> To: Tamar Christina via Gcc-patches >>>>>>>>>>> <gcc-patches@gcc.gnu.org> >>>>>>>>>>> Cc: Richard Sandiford <richard.sandif...@arm.com>; nd >>>>>>> <n...@arm.com> >>>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of >>>>>>>>>>> conditional branches, give hint if argument is a truth >>>>>>>>>>> type to backend >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Am 29.09.2022 um 12:23 schrieb Tamar Christina via >>>>>>>>>>>> Gcc-patches >>>>>>>>>>>> <gcc- >>>>>>>>>>> patc...@gcc.gnu.org>: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: Richard Biener <rguent...@suse.de> >>>>>>>>>>>>> Sent: Thursday, September 29, 2022 10:41 AM >>>>>>>>>>>>> To: Richard Sandiford <richard.sandif...@arm.com> >>>>>>>>>>>>> Cc: Jeff Law <jeffreya...@gmail.com>; Tamar Christina >>>>>>>>>>>>> <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; nd >>>>>>>>>>> <n...@arm.com> >>>>>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion >>>>>>>>>>>>> of conditional branches, give hint if argument is a >>>>>>>>>>>>> truth type to backend >>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 29 Sep 2022, Richard Sandiford wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jeff Law <jeffreya...@gmail.com> writes: >>>>>>>>>>>>>>> On 9/28/22 09:04, Richard Sandiford wrote: >>>>>>>>>>>>>>>> Tamar Christina <tamar.christ...@arm.com> writes: >>>>>>>>>>>>>>>>>> Maybe the target could use (subreg:SI (reg:BI >>>>>>>>>>>>>>>>>> ...)) as >>>>>>> argument. >>>>>>>>>>> Heh. >>>>>>>>>>>>>>>>> But then I'd still need to change the expansion >>>>>>>>>>>>>>>>> code. I suppose this could prevent the issue with >>>>>>>>>>>>>>>>> changes to code on >>>>>>>>> other targets. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> We have undocumented addcc, negcc, etc. >>>>>>>>>>>>>>>>>>>> patterns, >>>>>>> should >>>>>>>>> we >>>>>>>>>>>>>>>>>>>> have aandcc >>>>>>>>>>>>>>>>> pattern for this indicating support for andcc + >>>>>>>>>>>>>>>>> jump as opposedto >>>>>>>>>>>>> cmpcc + jump? >>>>>>>>>>>>>>>>>>> This could work yeah. I didn't know these existed. >>>>>>>>>>>>>>>>>> Ah, so they are conditional add, not add setting >>>>>>>>>>>>>>>>>> CC, so andcc wouldn't be appropriate. >>>>>>>>>>>>>>>>>> So I'm not sure how we'd handle such situation - >>>>>>>>>>>>>>>>>> maybe looking at REG_DECL and recognizing a _Bool >>>>>>>>>>>>>>>>>> PARM_DECL is >>>>>>> OK? >>>>>>>>>>>>>>>>> I have a slight suspicion that Richard Sandiford >>>>>>>>>>>>>>>>> would likely reject this though.. >>>>>>>>>>>>>>>> Good guess :-P We shouldn't rely on something like >>>>>>>>>>>>>>>> that for >>>>>>>>>>>>> correctness. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Would it help if we promoted the test-and-branch >>>>>>>>>>>>>>>> instructions to optabs, alongside cbranch? The jump >>>>>>>>>>>>>>>> expanders could then target it >>>>>>>>>>>>> directly. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> IMO that'd be a reasonable thing to do if it does help. >>>>>>>>>>>>>>>> It's a relatively common operation, especially on >>>>>>>>>>>>>>>> CISCy >>>>> targets. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But don't we represent these single bit tests using >>>>>>>>>>>>>>> zero_extract as the condition of the branch? I guess >>>>>>>>>>>>>>> if we can generate them directly rather than waiting >>>>>>>>>>>>>>> for combine to deduce that we're dealing with a >>>>>>>>>>>>>>> single bit test and constructing the zero_extract >>>>>>>>>>>>>>> form would be an improvement and might help aarch at >>>>>>>>>>>>>>> the same >>>>>>>>> time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you mean that the promote_mode stuff should use >>>>>>>>>>>>>> ext(z)v rather than zero_extend to promote a bool, where >>> available? >>>>>>>>>>>>>> If so, I agree that might help. But it sounds like it >>>>>>>>>>>>>> would have downsides >>>>>>>>> too. >>>>>>>>>>>>>> Currently a bool memory can be zero-extended on the >>>>>>>>>>>>>> fly using a load, but if we used the zero_extract form >>>>>>>>>>>>>> instead, we'd have to extract the bit after the load. >>>>>>>>>>>>>> And (as an >>>>>>>>>>>>>> alternative) choosing different behaviour based on >>>>>>>>>>>>>> whether expand sees a REG or a MEM sounds like it >>>>>>>>>>>>>> could still cause problems, since REGs could be >>>>>>>>>>>>>> replaced by MEMs (or vice versa) >>>>>>> later in the RTL passes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ISTM that the original patch was inserting an extra >>>>>>>>>>>>>> operation in the branch expansion in order to target a >>>>>>>>>>>>>> specific >>>>> instruction. >>>>>>>>>>>>>> Targeting the instruction in expand seems good, but >>>>>>>>>>>>>> IMO we should do it directly, based on knowledge of >>>>>>>>>>>>>> whether the instruction actually >>>>>>>>>>> exists. >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, I think a compare-and-branch pattern is the best fit >>> here. >>>>>>>>>>>>> Note on GIMPLE we'd rely on the fact this is a >>>>>>>>>>>>> BOOLEAN_TYPE (so even 8 bit precision bools only have 1 >>>>>>>>>>>>> and 0 as meaningful >>>>> values). >>>>>>>>>>>>> So the 'compare-' bit in compare-and-branch would be >>>>>>>>>>>>> interpreting a BOOLEAN_TYPE, not so much a general >>> compare. >>>>>>>>>>>> >>>>>>>>>>>> Oh, I was thinking of adding a constant argument >>>>>>>>>>>> representing the precision that is relevant for the >>>>>>>>>>>> compare in order to make this a bit more >>>>>>>>>>> general/future proof. >>>>>>>>>>>> >>>>>>>>>>>> Are you thinking I should instead just make the optab >>>>>>>>>>>> implicitly only work for 1-bit precision comparisons? >>>>>>>>>>> >>>>>>>>>>> What?s the optab you propose (cite also the documentation >>> part)? >>>>>>>>>> >>>>>>>>>> tbranchmode5 >>>>>>>>>> Conditional branch instruction combined with a bit test >>> instruction. >>>>>>>>> Operand 0 is a comparison operator. >>>>>>>>>> Operand 1 and Operand 2 are the first and second operands >>>>>>>>>> of the >>>>>>>>> comparison, respectively. >>>>>>>>>> Operand 3 is the number of low-order bits that are >>>>>>>>>> relevant for the >>>>>>>>> comparison. >>>>>>>>>> Operand 4 is the code_label to jump to. >>>>>>>>> >>>>>>>>> For the TB instructions (and for other similar instructions >>>>>>>>> that I've seen on other architectures) it would be more >>>>>>>>> useful to have a single-bit test, with operand 4 specifying the bit >>> position. >>>>>>>>> Arguably it might then be better to have separate eq and ne >>>>>>>>> optabs, to avoid the awkward doubling of the operands >>>>>>>>> (operand 1 contains >>>>>>> operands 2 and 3). >>>>>>>>> >>>>>>>>> I guess a more general way of achieving the same thing would >>>>>>>>> be to make operand 4 in the optab above a mask rather than a bit >>> count. >>>>>>>>> But that might be overly general, if there are no known >>>>>>>>> architectures that have such an instruction. >>>>>>>> >>>>>>>> One of the reasons I wanted a range rather than a single bit >>>>>>>> is that I can the use this to generate cbz/cbnz early on as well. >>>>>>> >>>>>>> We already have the opportunity to do that via cbranch<mode>4. >>>>>>> But at the moment aarch64.md always forces the separate >>>>>>> comparison instead. (Not sure why TBH. Does it enable more >>>>>>> ifcvt >>>>>>> opportunities?) >>>>>>> >>>>>>> If we change the body of cbranch<mode>4 to: >>>>>>> >>>>>>> if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) != >>> NE) >>>>>>> || operands[2] != const0_rtx) >>>>>>> { >>>>>>> operands[1] = aarch64_gen_compare_reg (GET_CODE >>> (operands[0]), >>>>>>> operands[1], operands[2]); >>>>>>> operands[2] = const0_rtx; >>>>>>> } >>>>>>> >>>>>>> then we generate the cbz/cbnz directly. >>>>>>> >>>>>> >>>>>> Ah ok, then if Richi agrees, bitpos it is then instead of bit count. >>>>> >>>>> Somehow I understood that cbranch<>4 is already fully capable of the >>>>> optimization? >>>>> >>>>> On your earlier proposal I'd have commented that if it wouldn't make >>>>> sense to instead have a CCmode setter instead of an expander with a >>> branch label? >>>>> That would be a bit test, like {sign,zero}_extract compared against >>>>> zero which can then be combined with a branch? >>>>> >>>> >>>> I missed that part, that could work too. >>>> >>>>> Of course if the ISA is really bit-test-and-branch then cbranch<>4 >>>>> itself or a variant of it might be more useful. Maybe >>>>> cbranchbi4 would be "abused" for this? >>>> >>>> The instruction is an actual bit-test-and-branch with any arbitrary bitpos. >>>> Yes we can abuse cbranchbi4 for this, but then it also means we can't e.g. >>>> use this to optimize a < 0 where a is a signed value. With the new >>>> optab this would just be a bit-test-and-branch of the sign bit. >>>> >>>> But also I'm not entirely convinced that using `BImode` and assuming a >>>> single bit is safe here. What happens if I compile my source with -std=c89? >>>> >>>> So I personally think the new optab makes more sense here. The CC setter >>> would work too. >>>> >>>> I guess my question is, between you folks, which approach would you >>>> like. It seems that Richi You'd like a CC setter. Richard do you have a >>> preference of one over the other? >>> >>> My order of preference is >>> >>> a) an existing pattern, if possible >>> b) something usable by N > 1 targets we know of, even if it requires some >>> combine magic >>> c) something that fits the actual ISA >>> >>> For b), x86 has BEXTR which performs a zero_extract from reg/mem and sets >>> ZF when the result is zero. For a branch on sign bit there's easier ways, >>> so it's >>> probably not very useful for general compare and branch optimization and if >>> (a & 0x30) is probably handled by combine(?). >> >> Agreed, I was more giving an example of other uses. But ok. >> >>> It also seems that if (a & 1) is handled for aarch64 already and it's just >>> a lack of >>> an optab that allows RTL expansion to expand if (bool) as if (bool & 1)? >> >> Yes this was what I was originally after. Your original suggestion was to >> abuse BImode >> and cbranch for this. That could work, but I worry about introducing BImode >> in the RTL, >> as I don't think we currently use it at all and wonder about new missed >> optimizations. >> >> Richard Sandiford, would this be OK with you? I also don't want to emit an >> extract here as I think >> that's gonna have a bigger effect on other targets than & 1 did. >> >> I would rather have something I can explicitly check for target support for. >> I also wonder if just a >> target hook asking the target how to expand a given tree Boolean argument >> would work. > > Personally I'd prefer the test-and-branch optab. We used to have > separate compare optabs and branch optabs in the cc0 days, but having > the combined pattern turned out to be much easier. We don't currently > expose CC registers at the optabs level and I think that's a good thing. > > We could have a test-bit-and-set-pseudo optab. But using that optab > here would reintroduce the problem that I had with the original patch, > which is: > > - Currently, the branch on the boolean value expands to a single optab > (cbranch<mode>4). That optab could easily expand to a single cb(n)z > instruction on aarch64 (even though the port chooses not to do that > currently). So taken on its own, the branch is already maximally > efficient. > > The patch instead adds an extra instruction to the branch expansion, > so that it uses two optabs rather than one. On the face of it, > this extra instruction is entirely redundant. But emitting it can > sometimes allow profitable combines. That is, rather than expand to: > > and reg, reg, 0xFF (from promote_mode) > cbranch on reg (from the branch expansion) > > the patch expanded to: > > and reg, reg, 0xFF (from promote_mode) > and reg, reg, 0x1 (from the branch expansion) > cbranch on reg (from the branch expansion) > > The first two ANDs come from separate sources. When both ANDs exist, > combine can get rid of the redundancy and can then see that the > retained AND 0x1 + cbranch optimise to a test-and-branch instruction. > > But if the target can't in fact use a test-and-branch in a particular > situation, we'd be left with the AND 0x1 and the cbranch. That's not > too bad if the AND from promote_mode and the AND 0x1 can be combined. > But if the AND with 0xff happens "too far" from the AND with 0x1 > (e.g. because it's in a loop preheader), or if there's intermediate > logic, we might end up keeping all three instructions. > > Emitting an extra bit test instruction as part of the branch expansion > IMO has the same problem. We're relying on combine combining the > (redundant) bit test with the cbranch. If combine doesn't do that > (e.g. because the target doesn't support the combination in certain > cases) then we could be left with three instructions rather than the > original two. > > That's why I think the fix is either for promote_mode to use a different > expansion for booleans (which could also have knock-on effects) or for > the branch expanders to target the test-and-branch instruction directly. > > The optab wouldn't be an aarch64 special. arc, avr, and h8300sx have > similar instructions. There are problem others too: h8300sx was from > memory and I stopped looking after avr. :-) Fair enough, let’s go with that then. Richard > Thanks, > Richard
Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend
Richard Biener via Gcc-patches Fri, 30 Sep 2022 07:33:25 -0700
- RE: [PATCH 1/2... Tamar Christina via Gcc-patches
- Re: [PATCH 1/2... Richard Sandiford via Gcc-patches
- RE: [PATCH 1/2... Tamar Christina via Gcc-patches
- Re: [PATCH 1/2... Richard Sandiford via Gcc-patches
- RE: [PATCH 1/2... Tamar Christina via Gcc-patches
- RE: [PATCH 1/2... Richard Biener via Gcc-patches
- RE: [PATCH 1/2... Tamar Christina via Gcc-patches
- RE: [PATCH 1/2... Richard Biener via Gcc-patches
- RE: [PATCH 1/2... Tamar Christina via Gcc-patches
- Re: [PATCH 1/2... Richard Sandiford via Gcc-patches
- Re: [PATCH 1/2... Richard Biener via Gcc-patches
- Re: [PATCH 1/2... Jeff Law via Gcc-patches
- Re: [PATCH 1/2]middle-end: RFC: On expan... Andrew Pinski via Gcc-patches