Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

Richard Biener via Gcc-patches Fri, 30 Sep 2022 07:33:25 -0700

> Am 30.09.2022 um 16:29 schrieb Richard Sandiford via Gcc-patches 
> <gcc-patches@gcc.gnu.org>:
> 
> Tamar Christina <tamar.christ...@arm.com> writes:
>>> -----Original Message-----
>>> From: Richard Biener <rguent...@suse.de>
>>> Sent: Friday, September 30, 2022 12:53 PM
>>> To: Tamar Christina <tamar.christ...@arm.com>
>>> Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina via
>>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Jeff Law
>>> <jeffreya...@gmail.com>
>>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>>> branches, give hint if argument is a truth type to backend
>>> 
>>>> On Fri, 30 Sep 2022, Tamar Christina wrote:
>>> 
>>>>> -----Original Message-----
>>>>> From: Richard Biener <rguent...@suse.de>
>>>>> Sent: Friday, September 30, 2022 11:17 AM
>>>>> To: Tamar Christina <tamar.christ...@arm.com>
>>>>> Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina
>>>>> via Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Jeff
>>> Law
>>>>> <jeffreya...@gmail.com>
>>>>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>>>>> branches, give hint if argument is a truth type to backend
>>>>> 
>>>>> On Fri, 30 Sep 2022, Tamar Christina wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Richard Sandiford <richard.sandif...@arm.com>
>>>>>>> Sent: Friday, September 30, 2022 9:49 AM
>>>>>>> To: Tamar Christina <tamar.christ...@arm.com>
>>>>>>> Cc: Richard Biener <rguent...@suse.de>; Tamar Christina via
>>>>>>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Jeff
>>> Law
>>>>>>> <jeffreya...@gmail.com>
>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>>>>>>> conditional branches, give hint if argument is a truth type to
>>>>>>> backend
>>>>>>> 
>>>>>>> Tamar Christina <tamar.christ...@arm.com> writes:
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Richard Sandiford <richard.sandif...@arm.com>
>>>>>>>>> Sent: Friday, September 30, 2022 9:29 AM
>>>>>>>>> To: Tamar Christina <tamar.christ...@arm.com>
>>>>>>>>> Cc: Richard Biener <rguent...@suse.de>; Tamar Christina via
>>>>>>>>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; Jeff
>>>>> Law
>>>>>>>>> <jeffreya...@gmail.com>
>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>>>>>>>>> conditional branches, give hint if argument is a truth type
>>>>>>>>> to backend
>>>>>>>>> 
>>>>>>>>> Tamar Christina <tamar.christ...@arm.com> writes:
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Gcc-patches <gcc-patches-
>>>>>>>>>>> bounces+tamar.christina=arm....@gcc.gnu.org> On Behalf Of
>>>>>>> Richard
>>>>>>>>>>> Biener via Gcc-patches
>>>>>>>>>>> Sent: Thursday, September 29, 2022 12:09 PM
>>>>>>>>>>> To: Tamar Christina via Gcc-patches
>>>>>>>>>>> <gcc-patches@gcc.gnu.org>
>>>>>>>>>>> Cc: Richard Sandiford <richard.sandif...@arm.com>; nd
>>>>>>> <n...@arm.com>
>>>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>>>>>>>>>>> conditional branches, give hint if argument is a truth
>>>>>>>>>>> type to backend
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Am 29.09.2022 um 12:23 schrieb Tamar Christina via
>>>>>>>>>>>> Gcc-patches
>>>>>>>>>>>> <gcc-
>>>>>>>>>>> patc...@gcc.gnu.org>:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Richard Biener <rguent...@suse.de>
>>>>>>>>>>>>> Sent: Thursday, September 29, 2022 10:41 AM
>>>>>>>>>>>>> To: Richard Sandiford <richard.sandif...@arm.com>
>>>>>>>>>>>>> Cc: Jeff Law <jeffreya...@gmail.com>; Tamar Christina
>>>>>>>>>>>>> <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; nd
>>>>>>>>>>> <n...@arm.com>
>>>>>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion
>>>>>>>>>>>>> of conditional branches, give hint if argument is a
>>>>>>>>>>>>> truth type to backend
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jeff Law <jeffreya...@gmail.com> writes:
>>>>>>>>>>>>>>> On 9/28/22 09:04, Richard Sandiford wrote:
>>>>>>>>>>>>>>>> Tamar Christina <tamar.christ...@arm.com> writes:
>>>>>>>>>>>>>>>>>> Maybe the target could use (subreg:SI (reg:BI
>>>>>>>>>>>>>>>>>> ...)) as
>>>>>>> argument.
>>>>>>>>>>> Heh.
>>>>>>>>>>>>>>>>> But then I'd still need to change the expansion
>>>>>>>>>>>>>>>>> code. I suppose this could prevent the issue with
>>>>>>>>>>>>>>>>> changes to code on
>>>>>>>>> other targets.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> We have undocumented addcc, negcc, etc.
>>>>>>>>>>>>>>>>>>>> patterns,
>>>>>>> should
>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>> have aandcc
>>>>>>>>>>>>>>>>> pattern for this indicating support for andcc +
>>>>>>>>>>>>>>>>> jump as opposedto
>>>>>>>>>>>>> cmpcc + jump?
>>>>>>>>>>>>>>>>>>> This could work yeah. I didn't know these existed.
>>>>>>>>>>>>>>>>>> Ah, so they are conditional add, not add setting
>>>>>>>>>>>>>>>>>> CC, so andcc wouldn't be appropriate.
>>>>>>>>>>>>>>>>>> So I'm not sure how we'd handle such situation -
>>>>>>>>>>>>>>>>>> maybe looking at REG_DECL and recognizing a _Bool
>>>>>>>>>>>>>>>>>> PARM_DECL is
>>>>>>> OK?
>>>>>>>>>>>>>>>>> I have a slight suspicion that Richard Sandiford
>>>>>>>>>>>>>>>>> would likely reject this though..
>>>>>>>>>>>>>>>> Good guess :-P  We shouldn't rely on something like
>>>>>>>>>>>>>>>> that for
>>>>>>>>>>>>> correctness.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Would it help if we promoted the test-and-branch
>>>>>>>>>>>>>>>> instructions to optabs, alongside cbranch?  The jump
>>>>>>>>>>>>>>>> expanders could then target it
>>>>>>>>>>>>> directly.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> IMO that'd be a reasonable thing to do if it does help.
>>>>>>>>>>>>>>>> It's a relatively common operation, especially on
>>>>>>>>>>>>>>>> CISCy
>>>>> targets.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> But don't we represent these single bit tests using
>>>>>>>>>>>>>>> zero_extract as the condition of the branch?  I guess
>>>>>>>>>>>>>>> if we can generate them directly rather than waiting
>>>>>>>>>>>>>>> for combine to deduce that we're dealing with a
>>>>>>>>>>>>>>> single bit test and constructing the zero_extract
>>>>>>>>>>>>>>> form would be an improvement and might help aarch at
>>>>>>>>>>>>>>> the same
>>>>>>>>> time.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Do you mean that the promote_mode stuff should use
>>>>>>>>>>>>>> ext(z)v rather than zero_extend to promote a bool, where
>>> available?
>>>>>>>>>>>>>> If so, I agree that might help.  But it sounds like it
>>>>>>>>>>>>>> would have downsides
>>>>>>>>> too.
>>>>>>>>>>>>>> Currently a bool memory can be zero-extended on the
>>>>>>>>>>>>>> fly using a load, but if we used the zero_extract form
>>>>>>>>>>>>>> instead, we'd have to extract the bit after the load.
>>>>>>>>>>>>>> And (as an
>>>>>>>>>>>>>> alternative) choosing different behaviour based on
>>>>>>>>>>>>>> whether expand sees a REG or a MEM sounds like it
>>>>>>>>>>>>>> could still cause problems, since REGs could be
>>>>>>>>>>>>>> replaced by MEMs (or vice versa)
>>>>>>> later in the RTL passes.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ISTM that the original patch was inserting an extra
>>>>>>>>>>>>>> operation in the branch expansion in order to target a
>>>>>>>>>>>>>> specific
>>>>> instruction.
>>>>>>>>>>>>>> Targeting the instruction in expand seems good, but
>>>>>>>>>>>>>> IMO we should do it directly, based on knowledge of
>>>>>>>>>>>>>> whether the instruction actually
>>>>>>>>>>> exists.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, I think a compare-and-branch pattern is the best fit
>>> here.
>>>>>>>>>>>>> Note on GIMPLE we'd rely on the fact this is a
>>>>>>>>>>>>> BOOLEAN_TYPE (so even 8 bit precision bools only have 1
>>>>>>>>>>>>> and 0 as meaningful
>>>>> values).
>>>>>>>>>>>>> So the 'compare-' bit in compare-and-branch would be
>>>>>>>>>>>>> interpreting a BOOLEAN_TYPE, not so much a general
>>> compare.
>>>>>>>>>>>> 
>>>>>>>>>>>> Oh, I was thinking of adding a constant argument
>>>>>>>>>>>> representing the precision that is relevant for the
>>>>>>>>>>>> compare in order to make this a bit more
>>>>>>>>>>> general/future proof.
>>>>>>>>>>>> 
>>>>>>>>>>>> Are you thinking I should instead just make the optab
>>>>>>>>>>>> implicitly only work for 1-bit precision comparisons?
>>>>>>>>>>> 
>>>>>>>>>>> What?s the optab you propose (cite also the documentation
>>> part)?
>>>>>>>>>> 
>>>>>>>>>> tbranchmode5
>>>>>>>>>>  Conditional branch instruction combined with a bit test
>>> instruction.
>>>>>>>>> Operand 0 is a comparison operator.
>>>>>>>>>>  Operand 1 and Operand 2 are the first and second operands
>>>>>>>>>> of the
>>>>>>>>> comparison, respectively.
>>>>>>>>>>  Operand 3 is the number of low-order bits that are
>>>>>>>>>> relevant for the
>>>>>>>>> comparison.
>>>>>>>>>>  Operand 4 is the code_label to jump to.
>>>>>>>>> 
>>>>>>>>> For the TB instructions (and for other similar instructions
>>>>>>>>> that I've seen on other architectures) it would be more
>>>>>>>>> useful to have a single-bit test, with operand 4 specifying the bit
>>> position.
>>>>>>>>> Arguably it might then be better to have separate eq and ne
>>>>>>>>> optabs, to avoid the awkward doubling of the operands
>>>>>>>>> (operand 1 contains
>>>>>>> operands 2 and 3).
>>>>>>>>> 
>>>>>>>>> I guess a more general way of achieving the same thing would
>>>>>>>>> be to make operand 4 in the optab above a mask rather than a bit
>>> count.
>>>>>>>>> But that might be overly general, if there are no known
>>>>>>>>> architectures that have such an instruction.
>>>>>>>> 
>>>>>>>> One of the reasons I wanted a range rather than a single bit
>>>>>>>> is that I can the use this to generate cbz/cbnz early on as well.
>>>>>>> 
>>>>>>> We already have the opportunity to do that via cbranch<mode>4.
>>>>>>> But at the moment aarch64.md always forces the separate
>>>>>>> comparison instead.  (Not sure why TBH.  Does it enable more
>>>>>>> ifcvt
>>>>>>> opportunities?)
>>>>>>> 
>>>>>>> If we change the body of cbranch<mode>4 to:
>>>>>>> 
>>>>>>>  if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) !=
>>> NE)
>>>>>>>      || operands[2] != const0_rtx)
>>>>>>>    {
>>>>>>>      operands[1] = aarch64_gen_compare_reg (GET_CODE
>>> (operands[0]),
>>>>>>>                                           operands[1], operands[2]);
>>>>>>>      operands[2] = const0_rtx;
>>>>>>>    }
>>>>>>> 
>>>>>>> then we generate the cbz/cbnz directly.
>>>>>>> 
>>>>>> 
>>>>>> Ah ok, then if Richi agrees, bitpos it is then instead of bit count.
>>>>> 
>>>>> Somehow I understood that cbranch<>4 is already fully capable of the
>>>>> optimization?
>>>>> 
>>>>> On your earlier proposal I'd have commented that if it wouldn't make
>>>>> sense to instead have a CCmode setter instead of an expander with a
>>> branch label?
>>>>> That would be a bit test, like {sign,zero}_extract compared against
>>>>> zero which can then be combined with a branch?
>>>>> 
>>>> 
>>>> I missed that part, that could work too.
>>>> 
>>>>> Of course if the ISA is really bit-test-and-branch then cbranch<>4
>>>>> itself or a variant of it might be more useful.  Maybe
>>>>> cbranchbi4 would be "abused" for this?
>>>> 
>>>> The instruction is an actual bit-test-and-branch with any arbitrary bitpos.
>>>> Yes we can abuse cbranchbi4 for this, but then it also means we can't e.g.
>>>> use this to optimize a < 0 where a is a signed value.  With the new
>>>> optab this would just be a bit-test-and-branch of the sign bit.
>>>> 
>>>> But also I'm not entirely convinced that using `BImode` and assuming a
>>>> single bit is safe here. What happens if I compile my source with -std=c89?
>>>> 
>>>> So I personally think the new optab makes more sense here. The CC setter
>>> would work too.
>>>> 
>>>> I guess my question is, between you folks, which approach would you
>>>> like. It seems that Richi You'd like a CC setter. Richard do you have a
>>> preference of one over the other?
>>> 
>>> My order of preference is
>>> 
>>> a) an existing pattern, if possible
>>> b) something usable by N > 1 targets we know of, even if it requires some
>>> combine magic
>>> c) something that fits the actual ISA
>>> 
>>> For b), x86 has BEXTR which performs a zero_extract from reg/mem and sets
>>> ZF when the result is zero.  For a branch on sign bit there's easier ways, 
>>> so it's
>>> probably not very useful for general compare and branch optimization and if
>>> (a & 0x30) is probably handled by combine(?).
>> 
>> Agreed, I was more giving an example of other uses. But ok.
>> 
>>> It also seems that if (a & 1) is handled for aarch64 already and it's just 
>>> a lack of
>>> an optab that allows RTL expansion to expand if (bool) as if (bool & 1)?
>> 
>> Yes this was what I was originally after.  Your original suggestion was to 
>> abuse BImode
>> and cbranch for this.  That could work, but I worry about introducing BImode 
>> in the RTL,
>> as I don't think we currently use it at all and wonder about new missed 
>> optimizations.
>> 
>> Richard Sandiford, would this be OK with you? I also don't want to emit an 
>> extract here as I think
>> that's gonna have a bigger effect on other targets than & 1 did.
>> 
>> I would rather have something I can explicitly check for target support for. 
>> I also wonder if just a
>> target hook asking the target how to expand a given tree Boolean argument 
>> would work.
> 
> Personally I'd prefer the test-and-branch optab.  We used to have
> separate compare optabs and branch optabs in the cc0 days, but having
> the combined pattern turned out to be much easier.  We don't currently
> expose CC registers at the optabs level and I think that's a good thing.
> 
> We could have a test-bit-and-set-pseudo optab.  But using that optab
> here would reintroduce the problem that I had with the original patch,
> which is:
> 
> - Currently, the branch on the boolean value expands to a single optab
>  (cbranch<mode>4).  That optab could easily expand to a single cb(n)z
>  instruction on aarch64 (even though the port chooses not to do that
>  currently).  So taken on its own, the branch is already maximally
>  efficient.
> 
>  The patch instead adds an extra instruction to the branch expansion,
>  so that it uses two optabs rather than one.  On the face of it,
>  this extra instruction is entirely redundant.  But emitting it can
>  sometimes allow profitable combines.  That is, rather than expand to:
> 
>    and reg, reg, 0xFF (from promote_mode)
>    cbranch on reg     (from the branch expansion)
> 
>  the patch expanded to:
> 
>    and reg, reg, 0xFF (from promote_mode)
>    and reg, reg, 0x1  (from the branch expansion)
>    cbranch on reg     (from the branch expansion)
> 
>  The first two ANDs come from separate sources.  When both ANDs exist,
>  combine can get rid of the redundancy and can then see that the
>  retained AND 0x1 + cbranch optimise to a test-and-branch instruction.
> 
>  But if the target can't in fact use a test-and-branch in a particular
>  situation, we'd be left with the AND 0x1 and the cbranch.  That's not
>  too bad if the AND from promote_mode and the AND 0x1 can be combined.
>  But if the AND with 0xff happens "too far" from the AND with 0x1
>  (e.g. because it's in a loop preheader), or if there's intermediate
>  logic, we might end up keeping all three instructions.
> 
> Emitting an extra bit test instruction as part of the branch expansion
> IMO has the same problem.  We're relying on combine combining the
> (redundant) bit test with the cbranch.  If combine doesn't do that
> (e.g. because the target doesn't support the combination in certain
> cases) then we could be left with three instructions rather than the
> original two.
> 
> That's why I think the fix is either for promote_mode to use a different
> expansion for booleans (which could also have knock-on effects) or for
> the branch expanders to target the test-and-branch instruction directly.
> 
> The optab wouldn't be an aarch64 special.  arc, avr, and h8300sx have
> similar instructions.  There are problem others too: h8300sx was from
> memory and I stopped looking after avr. :-)

Fair enough, let’s go with that then.

Richard 


> Thanks,
> Richard
Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

Reply via email to