Tamar Christina <tamar.christ...@arm.com> writes: > Hi All, > > We were analyzing code quality after recent changes and have noticed that the > tbz support somehow managed to increase the number of branches overall rather > than decreased them. > > While investigating this we figured out that the problem is that when an > existing & <contants> exists in gimple and the instruction is generated > because > of the range information gotten from the ANDed constant that we end up with > the > situation that you get a NOP AND in the RTL expansion. > > This is not a problem as CSE will take care of it normally. The issue is > when > this original AND was done in a location where PRE or FRE "lift" the AND to a > different basic block. This triggers a problem when the resulting value is > not > single use. Instead of having an AND and tbz, we end up generating an > AND + TST + BR if the mode is HI or QI. > > This CSE across BB was a problem before but this change made it worse. Our > branch patterns rely on combine being able to fold AND or zero_extends into > the > instructions. > > To work around this (since a proper fix is outside of the scope of stage-4) we > are limiting the new tbranch optab to only HI and QI mode values. This isn't > a > problem because these two modes are modes for which we don't have CBZ support, > so they are the problematic cases to begin with. Additionally booleans are > QI. > > The second thing we're doing is limiting the only legal bitpos to pos 0. i.e. > only the bottom bit. This such that we prevent the double ANDs as much as > possible. > > Now most other cases, i.e. where we had an explicit & in the source code are > still handled correctly by the anonymous (*tb<optab><ALLI:mode><GPI:mode>1) > pattern that was added along with tbranch support. > > This means we don't expand the superflous AND here, and while it doesn't fix > the > problem that in the cross BB case we loss tbz, it also doesn't make things > worse. > > With these tweaks we've now reduced the number of insn uniformly as originally > expected. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (tbranch_<code><mode>3): Restrict to SHORT > and bottom bit only. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/tbz_2.c: New test.
Agreed that reducing the scope of the new optimisation seems like a safe compromise for GCC 13. But could you add a testcase that shows the effect of both changes (reducing the mode selection and the bit selection)? The test above passes even without the patch. It would be good to have a PR tracking this too. Personally, I think we should try to get to the stage where gimple does the CSE optimisations we need, and where the tbranch optab can generate a tbz directly (rather than splitting it apart and hoping that combine will put it back together later). Thanks, Richard > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index > 2c1367977a68fc8e4289118e07bb61398856791e..aa09e93d85e9628e8944e03498697eb9597ef867 > 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -949,8 +949,8 @@ (define_insn "*cb<optab><mode>1" > > (define_expand "tbranch_<code><mode>3" > [(set (pc) (if_then_else > - (EQL (match_operand:ALLI 0 "register_operand") > - (match_operand 1 "aarch64_simd_shift_imm_<mode>")) > + (EQL (match_operand:SHORT 0 "register_operand") > + (match_operand 1 "const0_operand")) > (label_ref (match_operand 2 "")) > (pc)))] > "" > diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_2.c > b/gcc/testsuite/gcc.target/aarch64/tbz_2.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..ec128b58f35276a7c5452685a65c73f95f2d5f9a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/tbz_2.c > @@ -0,0 +1,130 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -std=c99 -fno-unwind-tables > -fno-asynchronous-unwind-tables" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <stdbool.h> > + > +void h(void); > + > +/* > +** g1: > +** cbnz w0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g1(int x) > +{ > + if (__builtin_expect (x, 0)) > + h (); > +} > + > +/* > +** g2: > +** tbnz x0, 0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g2(int x) > +{ > + if (__builtin_expect (x & 1, 0)) > + h (); > +} > + > +/* > +** g3: > +** tbnz x0, 3, .L[0-9]+ > +** ret > +** ... > +*/ > +void g3(int x) > +{ > + if (__builtin_expect (x & 8, 0)) > + h (); > +} > + > +/* > +** g4: > +** tbnz w0, #31, .L[0-9]+ > +** ret > +** ... > +*/ > +void g4(int x) > +{ > + if (__builtin_expect (x & (1 << 31), 0)) > + h (); > +} > + > +/* > +** g5: > +** tst w0, 255 > +** bne .L[0-9]+ > +** ret > +** ... > +*/ > +void g5(char x) > +{ > + if (__builtin_expect (x, 0)) > + h (); > +} > + > +/* > +** g6: > +** tbnz w0, 0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g6(char x) > +{ > + if (__builtin_expect (x & 1, 0)) > + h (); > +} > + > +/* > +** g7: > +** tst w0, 3 > +** bne .L[0-9]+ > +** ret > +** ... > +*/ > +void g7(char x) > +{ > + if (__builtin_expect (x & 3, 0)) > + h (); > +} > + > +/* > +** g8: > +** tbnz w0, 7, .L[0-9]+ > +** ret > +** ... > +*/ > +void g8(char x) > +{ > + if (__builtin_expect (x & (1 << 7), 0)) > + h (); > +} > + > +/* > +** g9: > +** tbnz w0, 0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g9(bool x) > +{ > + if (__builtin_expect (x, 0)) > + h (); > +} > + > +/* > +** g10: > +** tbnz w0, 0, .L[0-9]+ > +** ret > +** ... > +*/ > +void g10(bool x) > +{ > + if (__builtin_expect (x & 1, 0)) > + h (); > +} > +