> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Monday, September 26, 2022 1:43 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; jeffreya...@gmail.com; > Richard Sandiford <richard.sandif...@arm.com> > Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional > branches, give hint if argument is a truth type to backend > > On Mon, 26 Sep 2022, Richard Biener wrote: > > > On Mon, 26 Sep 2022, Tamar Christina wrote: > > > > > > Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh. > > > > > > But then I'd still need to change the expansion code. I suppose this could > prevent the issue with changes to code on other targets. > > > > > > > > > We have undocumented addcc, negcc, etc. patterns, should we > have aandcc pattern for this indicating support for andcc + jump as > opposedto cmpcc + jump? > > > > > > > > > > This could work yeah. I didn't know these existed. > > > > > > > Ah, so they are conditional add, not add setting CC, so andcc > > > > wouldn't be appropriate. > > > > > > > So I'm not sure how we'd handle such situation - maybe looking at > > > > REG_DECL and recognizing a _Bool PARM_DECL is OK? > > > > > > I have a slight suspicion that Richard Sandiford would likely reject this > though.. The additional AND seemed less hacky as it's just communicating > range. > > > > > > I still need to also figure out which representation of bool is being > > > used, > because only the 0-1 variant works. Is there a way to check that? > > > > So another option would be, in case you have (subreg:SI (reg:QI)), if > > we expand > > > > if (b != 0) > > > > expand that to > > > > !((b & 255) == 0) > > > > basically invert the comparison and the leverage the paradoxical > > subreg to specify a narrower immediate to AND with? Just hoping that > > arm can do 255 as immediate and still efficiently handle this?
We can and already do, and don't need that representation to do so. The problem is, handling 255 is already inefficient. It requires us to use an additional Instruction to test the value. Whereas we have a fused test single bit and branch instruction. > > > > Wouldn't this transform be possible in combine with the appropriate > > backend pattern and combine synthesizing the and for paradoxical > subregs? Not unless we have enough range information in RTL to know that whatever value has been fed into the cbranch has a range of 1 bit. A range of 8 bits we already have and isn't value useful. The idea was to transform what we currently have: tst w0, 255 bne .L4 ret i.e. test the bottom 8 bits, into tbnz w0, #0, .L4 ret i.e. test only bit 0 and branch based on that bit. We cannot do this when all we know is that the range is 8 bits. > > Looking at what we produce on aarch64 it seems 'bool' is using an SImode > register but your characterization that the upper 24 bits have undefined > content suggests that is a wrong representation? > If the ABI doesn't say anything about the upper bits we should reflect that > somehow? It does. And no "bool" is using QImode. The expansion of extern void h (); void g1(bool x) { if (__builtin_expect (x, 0)) h (); } Shows that the argument x is passed as a QI mode, but like many RISC targets (and even i386) we promote the argument during expansion: (insn 2 4 3 2 (set (reg/v:SI 92 [ x ]) (zero_extend:SI (reg:QI 0 x0 [ x ]))) "/app/example.cpp":4:1 -1 (nil)) But the value is passed as QImode. We use this fact to know that the range is 8 bits in the cbanch instruction. If no operation was done that requires a bigger range then combine will push the zero extend into the cbranch and we have various patterns to handle different forms of this. For instance: void g1(bool *x) { if (__builtin_expect (*x, 0)) h (); } Because of the load of x we generate: ldrb w0, [x0] cbnz w0, .L7 ret because we know the top bits are defined to 0 in this case and can just test the entire register. The reason for this promotion for us and many other backends is one of efficiency. If we don't promote to something we have native instructions for we would have to promote and demote the value at *every* instruction in RTL. This causes significant noise in the RTL. So we can't do anything different here. I have plans to try to fix this, but not in GCC 13. But even then it won't help with this case, because we explicitly need to know that the range is a single bit. Not 8 bits. Regards, Tamar > > Richard.