On 11/03/2025 17:39, Richard Sandiford wrote: > Alex Coplan <alex.cop...@arm.com> writes: > > Hi, > > > > The PR shows us spinning in dce.cc:fast_dce at the start of combine. > > This spinning appears to be because of a disagreement between the fast_dce > > code > > and the code in df-problems.cc:df_lr_bb_local_compute. Specifically, they > > disagree on the treatment of partial defs. For the testcase in the PR, we > > have > > the following insn in bb 3: > > > > (insn 10 8 13 3 (clobber (subreg:V1DF (reg/v:V2x1DF 104 [ __val ]) 8)) -1 > > (nil)) > > > > which gives rise to a DF def with DF_REF_FLAGS = 0x8b0, i.e. > > DF_REF_PARTIAL | DF_REF_READ_WRITE | DF_REF_MUST_CLOBBER | DF_REF_SUBREG. > > > > Eliding the large block comment for readability, the code in > > df_lr_bb_local_compute does the following (for each insn): > > > > FOR_EACH_INSN_INFO_DEF (def, insn_info) > > { > > unsigned int dregno = DF_REF_REGNO (def); > > bitmap_set_bit (&bb_info->def, dregno); > > if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) > > bitmap_set_bit (&bb_info->use, dregno); > > else > > bitmap_clear_bit (&bb_info->use, dregno); > > } > > > > i.e. it models partial defs as a RMW operation; thus for the def arising > > from i10 above, it records a use of r104; hence it ends up in the > > live-in set for bb 3. > > > > However, as it stands, the code in dce.cc:fast_dce (and its callee > > dce_process_block) has no such provision for DF_REF_PARTIAL defs. It > > does not treat these as a RMW and does not compute r104 above as being > > live-in to bb 3. At the end of dce_process_block we compute the > > following "did something happen" condition used to decide termination of > > the analysis: > > > > block_changed = !bitmap_equal_p (local_live, DF_LR_IN (bb)); > > if (block_changed) > > bitmap_copy (DF_LR_IN (bb), local_live); > > > > BITMAP_FREE (local_live); > > return block_changed; > > > > because of the disagreement between df_lr_local_compute and the local > > analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not > > in local_live. Hence we always return true here, call > > df_analyze_problem (which re-computes DF_LR_IN according to > > df_lr_bb_local_compute, re-adding r104), and so the analysis never > > terminates. > > > > This patch therefore adjusts df_simulate_defs (called from > > dce_process_block) to match the behaviour of df_lr_bb_local_compute in > > this respect, namely we make it model partial defs as RMW operations by > > setting the relevant register live. This fixes the spinning in fast_dce > > for this testcase. > > > > Bootstrapped/regtested on aarch64-linux-gnu, arm-linux-gnueabihf, and > > x86_64-linux-gnu: no regressions. I also checked that this is netural > > for SPEC CPU 2017 on Graviton 3. OK for trunk? If so, what about > > backports? > > > > Thanks, > > Alex > > > > gcc/ChangeLog: > > > > PR rtl-optimization/116564 > > * df-problems.cc (df_simulate_defs): For partial defs, mark the > > register live (treat it as a RMW operation). > > > > gcc/testsuite/ChangeLog: > > > > PR rtl-optimization/116564 > > * gcc.target/aarch64/torture/pr116564.c: New test. > > OK, thanks. I think it's also ok to backport after a settling period > (but probably best to leave a reasonable amount of time before backporting, > since this is sensitive code).
Thanks, pushed to trunk as g:758e617bcf224dc9d4a7e26dd858d43c1e63b916. Yes this seems like sensitive code indeed, so I'll leave this for at least a couple of weeks before thinking about backports. Alex > > Richard > > > > > diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc > > index f32185b3eac..90753793098 100644 > > --- a/gcc/df-problems.cc > > +++ b/gcc/df-problems.cc > > @@ -3893,9 +3893,11 @@ df_simulate_defs (rtx_insn *insn, bitmap live) > > { > > unsigned int dregno = DF_REF_REGNO (def); > > > > - /* If the def is to only part of the reg, it does > > - not kill the other defs that reach here. */ > > - if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) > > + /* If the def is to only part of the reg, model it as a RMW operation > > + by marking it live. It only kills the reg if it is a complete def. */ > > + if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) > > + bitmap_set_bit (live, dregno); > > + else > > bitmap_clear_bit (live, dregno); > > } > > } > > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c > > b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c > > new file mode 100644 > > index 00000000000..d471e097294 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c > > @@ -0,0 +1,11 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "" } */ > > +#include <arm_neon.h> > > +void test() > > +{ > > + for (int L = 0; L < 4; ++L) { > > + float64_t ResData[1 * 2]; > > + float64x1x2_t Src1; > > + vst2_f64(ResData, Src1); > > + } > > +}