On 11/20/2025 11:03 AM, Konstantinos Eleftheriou wrote:
This patch converts the fold-mem-offsets pass from DF to RTL-SSA.
Along with this conversion, the way the pass collects information
was completely reworked. Instead of visiting each instruction multiple
times, this is now down only once.
Most significant changes are:
* The pass operates mainly on insn_info objects from RTL-SSA.
* Single iteration over all nondebug INSNs for identification
of fold-mem-roots. Then walk of the fold-mem-roots' DEF-chain
to collect foldable constants.
* The class fold_mem_info holds vectors for the DEF-chain of
the to-be-folded INSNs (fold_agnostic_insns, which don't need
to be adjusted, and fold_insns, which need their constant to
be set to zero).
* Introduction of a single-USE mode, which only collects DEFs,
that have a single USE and therefore are safe to transform
(the fold-mem-root will be the final USE). This mode is fast
and will always run (unless disabled via -fno-fold-mem-offsets).
* Introduction of a multi-USE mode, which allows DEFs to have
multiple USEs, but all USEs must be part of any fold-mem-root's
DEF-chain. The analysis of all USEs is expensive and therefore,
this mode is disabled for highly connected CFGs. Note, that
multi-USE mode will miss some opportunities that the single-USE
mode finds (e.g. multi-USE mode fails for fold-mem-offsets-3.c).
The following testing was done:
* Bootstrapped and regtested on aarch64-linux, x86-64-linux and riscv64-linux.
* SPEC CPU 2017 tested on aarch64.
A compile time analysis with `/bin/time -v ./install/usr/local/bin/gcc -O2
all.i`
(all.i from PR117922) shows:
* -fno-fold-mem-offsets: 464 s (user time) / 26280384 kBytes (max resident set
size)
* -ffold-mem-offsets: 395 s (user time) / 26281388 kBytes (max resident set
size)
Adding -fexpensive-optimizations to enable multi-USE mode does not have
an impact on the duration or the memory footprint.
SPEC CPU 2017 showed no significant performance impact on aarch64-linux.
gcc/ChangeLog:
PR rtl-optimization/117922
* fold-mem-offsets.cc (INCLUDE_ALGORITHM): Added definition.
(INCLUDE_FUNCTIONAL): Likewise.
(INCLUDE_ARRAY): Likewise.
(class pass_fold_mem_offsets): Moved to bottom of file.
(class change_info): New.
(get_single_def_in_bb): Converted to RTL-SSA.
(get_fold_mem_offset_root): Converted to RTL-SSA.
(get_uses): New.
(fold_offsets): Converted to RTL-SSA.
(fold_offsets_1): Converted to RTL-SSA.
(has_foldable_uses_p): Converted to RTL-SSA.
(get_fold_mem_root): Removed.
(insn_uses_not_in_bitmap): New.
(drop_unsafe_candidates): New.
(do_commit_offset): Converted to RTL-SSA.
(do_analysis): Removed.
(do_commit_insn): Converted to RTL-SSA.
(do_fold_info_calculation): Removed.
(sort_changes): New.
(sort_pairs): New.
(do_check_validity): Removed.
(get_last_def): New.
(move_uses_to_prev_def): New.
(compute_validity_closure): Removed.
(change_in_vec_p): New.
(cancel_changes_for_group): New.
(find_keys_to_remove): New.
(update_insns): New.
(fold_mem_offsets_1): New.
(pass_fold_mem_offsets::execute): Moved to bottom of file.
(fold_mem_offsets): New.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/fold-mem-offsets.C: New test.
* gcc.target/aarch64/fold-mem-offsets.c: New test.
So coming back to this...As you are no doubt aware, there's been a couple bugfixes to fold-mem-offsets ACK'd and pushed into the tree which make this no longer apply cleanly. One of those committed patches just adjusts a comment and AFAICT the comment just isn't important anymore after the rewrite. The other change was a bugfix to make sure we distinguish between the mode of the MEM vs the mode of the address. It appears that this new code handles that correctly.
So conceptually the easiest way to move forward is to revert the two changes, apply your patch and generate a fresh diff against the tip of the trunk.
So a question. What after squashing together all the calculations the final offset is zero? Won't that be treated as a failure? It feels like we need a boolean for good/bad and a HOST_WIDE_INT for the updated offset?Co-authored-by: Christoph Müllner<[email protected]> Signed-off-by: Konstantinos Eleftheriou<[email protected]> --- Changes in v3: - Re-enable on all targets. - Re-add RISC-V testcases. Changes in v2: - Use the RTL-SSA changes framework for the instruction changes. - Keep a hash map of the changes and try to cancel the minimum number of changes when something goes wrong. - Remove redundant code. - Add AArch64 testcases. - Remove RISC-V testcases. Changes in v1: - Convert the fold-mem-offsets pass from DF to RTL-SSA. gcc/fold-mem-offsets.cc | 1450 +++++++++++------ .../g++.target/aarch64/fold-mem-offsets.C | 86 + .../gcc.target/aarch64/fold-mem-offsets.c | 19 + 3 files changed, 1016 insertions(+), 539 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/fold-mem-offsets.C create mode 100644 gcc/testsuite/gcc.target/aarch64/fold-mem-offsets.c diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc index c1c94472a071..b3438a393101 100644 --- a/gcc/fold-mem-offsets.cc +++ b/gcc/fold-mem-offsets.cc- /* We only fold through instructions that are transitively used as- memory addresses and do not have other uses. Use the same logic - from offset calculation to visit instructions that can propagate - offsets and keep track of them in CAN_FOLD_INSNS. */ - bool success; - struct df_link *uses = get_uses (def, dest, &success), *ref_link;- if (!success)- return 0; +/* Function that calculates the offset for INSN that would have to be added to + all its USEs of REG. Foldable INSNs are added to INFO->fold_insns and + fold-agnostic INSNs are added to INFO->fold_agnostic_insns. + It is possible that some INSNs are added to both lists. In this case the + INSN is a fold INSN.- for (ref_link = uses; ref_link; ref_link = ref_link->next)- { - rtx_insn *use = DF_REF_INSN (ref_link->ref); + Returns the offset on success or 0 if the calculation fails. */ +static HOST_WIDE_INT +fold_offsets (insn_info *insn, rtx reg, fold_mem_info *info, + bool single_use = true)
I do see one notable failure so far in my tester. csky-linux-gnu trips this assertion in fold_offsets:
563 gcc_assert (REGNO (reg) == REGNO (SET_DEST (PATTERN (def_rtl))));
The insn in question is a bit odd. But the real issue is I don't think you can assume that PATTERN (def_rtl) is going to be a single set and thus you can't use SET_DEST on it blindly.
(gdb) p debug_rtx (def_rtl)
(insn 588 137 139 15 (parallel [
(use (and:SI (not:SI (const_int 3 [0x3]))
(reg/f:SI 0 a0 [orig:211 _10 ] [211])))
(set (reg/v:SI 0 a0 [orig:212 a ] [212])
(and:SI (not:SI (const_int 3 [0x3]))
(reg/f:SI 0 a0 [orig:211 _10 ] [211])))
]) "../../../../gcc/libgcc/unwind-pe.h":206:9 280 {cskyv2_andnsi3}
(nil))
Testcase attached. Compile with -O2. Overall it looks pretty good to me, so I think we just shake out any failures and go with it once gcc-17 is open for development. I'll keep it enabled in my tester so that we pick up weirdos like alpha, hppa, m68k, etc.
jeff
unwind-dw2.i.gz
Description: GNU Zip compressed data
