https://gcc.gnu.org/g:7c6bd95b1811d797f43983dbbeb4ad478bdb5c9e
commit r13-9517-g7c6bd95b1811d797f43983dbbeb4ad478bdb5c9e Author: Alex Coplan <alex.cop...@arm.com> Date: Mon Mar 10 16:44:15 2025 +0000 df: Treat partial defs as uses in df_simulate_defs [PR116564] The PR shows us spinning in dce.cc:fast_dce at the start of combine. This spinning appears to be because of a disagreement between the fast_dce code and the code in df-problems.cc:df_lr_bb_local_compute. Specifically, they disagree on the treatment of partial defs. For the testcase in the PR, we have the following insn in bb 3: (insn 10 8 13 3 (clobber (subreg:V1DF (reg/v:V2x1DF 104 [ __val ]) 8)) -1 (nil)) which gives rise to a DF def with DF_REF_FLAGS = 0x8b0, i.e. DF_REF_PARTIAL | DF_REF_READ_WRITE | DF_REF_MUST_CLOBBER | DF_REF_SUBREG. Eliding the large block comment for readability, the code in df_lr_bb_local_compute does the following (for each insn): FOR_EACH_INSN_INFO_DEF (def, insn_info) { unsigned int dregno = DF_REF_REGNO (def); bitmap_set_bit (&bb_info->def, dregno); if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) bitmap_set_bit (&bb_info->use, dregno); else bitmap_clear_bit (&bb_info->use, dregno); } i.e. it models partial defs as a RMW operation; thus for the def arising from i10 above, it records a use of r104; hence it ends up in the live-in set for bb 3. However, as it stands, the code in dce.cc:fast_dce (and its callee dce_process_block) has no such provision for DF_REF_PARTIAL defs. It does not treat these as a RMW and does not compute r104 above as being live-in to bb 3. At the end of dce_process_block we compute the following "did something happen" condition used to decide termination of the analysis: block_changed = !bitmap_equal_p (local_live, DF_LR_IN (bb)); if (block_changed) bitmap_copy (DF_LR_IN (bb), local_live); BITMAP_FREE (local_live); return block_changed; because of the disagreement between df_lr_local_compute and the local analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not in local_live. Hence we always return true here, call df_analyze_problem (which re-computes DF_LR_IN according to df_lr_bb_local_compute, re-adding r104), and so the analysis never terminates. This patch therefore adjusts df_simulate_defs (called from dce_process_block) to match the behaviour of df_lr_bb_local_compute in this respect, namely we make it model partial defs as RMW operations by setting the relevant register live. This fixes the spinning in fast_dce for this testcase. gcc/ChangeLog: PR rtl-optimization/116564 * df-problems.cc (df_simulate_defs): For partial defs, mark the register live (treat it as a RMW operation). gcc/testsuite/ChangeLog: PR rtl-optimization/116564 * gcc.target/aarch64/torture/pr116564.c: New test. (cherry picked from commit 758e617bcf224dc9d4a7e26dd858d43c1e63b916) Diff: --- gcc/df-problems.cc | 8 +++++--- gcc/testsuite/gcc.target/aarch64/torture/pr116564.c | 11 +++++++++++ 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc index d2cfaf7f50f6..334ecc585646 100644 --- a/gcc/df-problems.cc +++ b/gcc/df-problems.cc @@ -3855,9 +3855,11 @@ df_simulate_defs (rtx_insn *insn, bitmap live) { unsigned int dregno = DF_REF_REGNO (def); - /* If the def is to only part of the reg, it does - not kill the other defs that reach here. */ - if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) + /* If the def is to only part of the reg, model it as a RMW operation + by marking it live. It only kills the reg if it is a complete def. */ + if (DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)) + bitmap_set_bit (live, dregno); + else bitmap_clear_bit (live, dregno); } } diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c new file mode 100644 index 000000000000..d471e097294c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/torture/pr116564.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "" } */ +#include <arm_neon.h> +void test() +{ + for (int L = 0; L < 4; ++L) { + float64_t ResData[1 * 2]; + float64x1x2_t Src1; + vst2_f64(ResData, Src1); + } +}