On Wed, May 31, 2023 at 12:29 AM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Wed, May 31, 2023 at 6:34 AM Andrew Pinski via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > After r14-1014-gc5df248509b489364c573e8, GCC started to emit > > directly a zero_extract for `(t1&0x8)!=0`. This introduced > > a small regression where ifcvt would not do the ifconversion > > as there is now a paradoxical subreg in the dest which > > was being rejected. Since paradoxical subreg set the whole > > register, we can treat it as the same as a reg in the two places. > > > > OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu. > > OK I guess. I vaguely remember SUBREG_PROMOTED_UNSIGNED_P > applies to non-paradoxical subregs but I might be swapping things - maybe > you remember better and whether that would cause any issues here?
So I looked into the history of the code in ifcvt.cc, this code was added with r6-3071-ge65bf4e814d38c to accept more complex bb (https://inbox.sourceware.org/gcc-patches/559fbb13.80...@arm.com/). The thread where we start talking about subregs is located with Jeff's email starting here: https://inbox.sourceware.org/gcc-patches/55bbafac.5020...@redhat.com/ . Jeff, I know Richard already approved this patch but could you provide a second eye as you were involved reviewing the original code here and I want to make sure I understood the code in a a reasonable fashion? Thanks, Andrew > > Thanks, > Richard. > > > gcc/ChangeLog: > > > > PR rtl-optimization/110042 > > * ifcvt.cc (bbs_ok_for_cmove_arith): Allow paradoxical subregs. > > (bb_valid_for_noce_process_p): Strip the subreg for the SET_DEST. > > > > gcc/testsuite/ChangeLog: > > > > PR rtl-optimization/110042 > > * gcc.target/aarch64/csel_bfx_2.c: New test. > > --- > > gcc/ifcvt.cc | 14 ++++++---- > > gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c | 27 +++++++++++++++++++ > > 2 files changed, 36 insertions(+), 5 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c > > > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc > > index 868eda93251..0b180b4568f 100644 > > --- a/gcc/ifcvt.cc > > +++ b/gcc/ifcvt.cc > > @@ -2022,7 +2022,7 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block > > bb_b, rtx to_rename) > > } > > > > /* Make sure this is a REG and not some instance > > - of ZERO_EXTRACT or SUBREG or other dangerous stuff. > > + of ZERO_EXTRACT or non-paradoxical SUBREG or other dangerous stuff. > > If we have a memory destination then we have a pair of simple > > basic blocks performing an operation of the form [addr] = c ? a : > > b. > > bb_valid_for_noce_process_p will have ensured that these are > > @@ -2030,7 +2030,8 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block > > bb_b, rtx to_rename) > > to be renamed. Assert that the callers set this up properly. */ > > if (MEM_P (SET_DEST (sset_b))) > > gcc_assert (rtx_equal_p (SET_DEST (sset_b), to_rename)); > > - else if (!REG_P (SET_DEST (sset_b))) > > + else if (!REG_P (SET_DEST (sset_b)) > > + && !paradoxical_subreg_p (SET_DEST (sset_b))) > > { > > BITMAP_FREE (bba_sets); > > return false; > > @@ -3136,14 +3137,17 @@ bb_valid_for_noce_process_p (basic_block test_bb, > > rtx cond, > > > > rtx sset = single_set (insn); > > gcc_assert (sset); > > + rtx dest = SET_DEST (sset); > > + if (SUBREG_P (dest)) > > + dest = SUBREG_REG (dest); > > > > if (contains_mem_rtx_p (SET_SRC (sset)) > > - || !REG_P (SET_DEST (sset)) > > - || reg_overlap_mentioned_p (SET_DEST (sset), cond)) > > + || !REG_P (dest) > > + || reg_overlap_mentioned_p (dest, cond)) > > goto free_bitmap_and_fail; > > > > potential_cost += pattern_cost (sset, speed_p); > > - bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset))); > > + bitmap_set_bit (test_bb_temps, REGNO (dest)); > > } > > } > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c > > b/gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c > > new file mode 100644 > > index 00000000000..c3b8a6f45cc > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c > > @@ -0,0 +1,27 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > +unsigned > > +f1(int t, int t1) > > +{ > > + int tt = 0; > > + if(t) > > + tt = (t1&0x8)!=0; > > + return tt; > > +} > > +struct f > > +{ > > + unsigned t:3; > > + unsigned t1:4; > > +}; > > +unsigned > > +f2(int t, struct f y) > > +{ > > + int tt = 0; > > + if(t) > > + tt = y.t1; > > + return tt; > > +} > > +/* Both f1 and f2 should produce a csel and not a cbz on the argument. */ > > +/* { dg-final { scan-assembler-times "csel\t" 2 } } */ > > +/* { dg-final { scan-assembler-times "ubfx\t" 2 } } */ > > +/* { dg-final { scan-assembler-not "cbz\t" } } */ > > -- > > 2.31.1 > >