> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Monday, September 26, 2022 1:43 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; jeffreya...@gmail.com;
> Richard Sandiford <richard.sandif...@arm.com>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
> 
> On Mon, 26 Sep 2022, Richard Biener wrote:
> 
> > On Mon, 26 Sep 2022, Tamar Christina wrote:
> >
> > > > Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
> > >
> > > But then I'd still need to change the expansion code. I suppose this could
> prevent the issue with changes to code on other targets.
> > >
> > > > > > We have undocumented addcc, negcc, etc. patterns, should we
> have aandcc pattern for this indicating support for andcc + jump as
> opposedto cmpcc + jump?
> > > > >
> > > > > This could work yeah. I didn't know these existed.
> > >
> > > > Ah, so they are conditional add, not add setting CC, so andcc
> > > > wouldn't be appropriate.
> > >
> > > > So I'm not sure how we'd handle such situation - maybe looking at
> > > > REG_DECL and recognizing a _Bool PARM_DECL is OK?
> > >
> > > I have a slight suspicion that Richard Sandiford would likely reject this
> though.. The additional AND seemed less hacky as it's just communicating
> range.
> > >
> > > I still need to also figure out which representation of bool is being 
> > > used,
> because only the 0-1 variant works. Is there a way to check that?
> >
> > So another option would be, in case you have (subreg:SI (reg:QI)), if
> > we expand
> >
> >  if (b != 0)
> >
> > expand that to
> >
> >  !((b & 255) == 0)
> >
> > basically invert the comparison and the leverage the paradoxical
> > subreg to specify a narrower immediate to AND with?  Just hoping that
> > arm can do 255 as immediate and still efficiently handle this?

We can and already do, and don't need that representation to do so.
The problem is, handling 255 is already inefficient. It requires us to use an 
additional
Instruction to test the value. Whereas we have a fused test single bit and 
branch instruction.

> >
> > Wouldn't this transform be possible in combine with the appropriate
> > backend pattern and combine synthesizing the and for paradoxical
> subregs?

Not unless we have enough range information in RTL to know that whatever value 
has
been fed into the cbranch has a range of 1 bit. A range of 8 bits we already 
have and isn't value useful.

The idea was to transform what we currently have:

        tst     w0, 255
        bne     .L4
        ret

i.e. test the bottom 8 bits, into

        tbnz    w0, #0, .L4
        ret

i.e. test only bit 0 and branch based on that bit. We cannot do this when all 
we know is that the range is 8 bits.

> 
> Looking at what we produce on aarch64 it seems 'bool' is using an SImode
> register but your characterization that the upper 24 bits have undefined
> content suggests that is a wrong representation?
> If the ABI doesn't say anything about the upper bits we should reflect that
> somehow?

It does. And no "bool" is using QImode. The expansion of

extern void h ();

void g1(bool x)
{
  if (__builtin_expect (x, 0))
    h ();
}

Shows that the argument x is passed as a QI mode, but like many RISC targets 
(and even i386) we promote the argument during expansion:

(insn 2 4 3 2 (set (reg/v:SI 92 [ x ])
        (zero_extend:SI (reg:QI 0 x0 [ x ]))) "/app/example.cpp":4:1 -1
     (nil))

But the value is passed as QImode.

We use this fact to know that the range is 8 bits in the cbanch instruction.  
If no operation was done that requires a bigger
range then combine will push the zero extend into the cbranch and we have 
various patterns to handle different forms of this.

For instance:

void g1(bool *x)
{
  if (__builtin_expect (*x, 0))
    h ();
}

Because of the load of x we generate:

        ldrb    w0, [x0]
        cbnz    w0, .L7
        ret

because we know the top bits are defined to 0 in this case and can just test 
the entire register.

The reason for this promotion for us and many other backends is one of 
efficiency. If we don't promote to something
we have native instructions for we would have to promote and demote the value 
at *every* instruction in RTL.

This causes significant noise in the RTL.  So we can't do anything different 
here.  I have plans to try to fix this, but not in GCC 13.

But even then it won't help with this case, because we explicitly need to know 
that the range is a single bit. Not 8 bits.

Regards,
Tamar

> 
> Richard.

Reply via email to