On 04/23/2015 07:56 PM, Alexandre Oliva wrote:
The other tricky bit was to fix all expander bits that required
SSA_NAMEs to have a associated decl. I've removed all such cases, so we
can now expand anonymous SSA decls directly, without having to create an
ignored decl. Doing that, we can coalesce variables and expand each
partition without worrying about choosing a preferred partition leader.
We just have to make sure we don't consider pairs of variables eligible
for coalescing if they should get different promoted modes, or a
different register-or-stack choice, and then expansion of partitions is
streamlined: we just expand each leader, and then adjust all SSA_NAMEs
to associate the RTL with their base variables, if any.
Nice.
In this revision of the patch, I have retained -ftree-coalesce-vars, so
that its negated form can be used in testcases that formerly expected no
coalescing across user variables, but that explicitly disabled VTA.
Seems reasonable.
As for testcases, while investigating test regressions, I found out
various guality failures had to do with VT's lack of awareness of custom
calling conventions. Caller's variables saved in registers that are
normally call-clobbered, but that are call-saved in custom conventions
set up for a callee, would end up invalidating the entry-point location
associations. I've arranged for var-tracking to use custom calling
conventions for register invalidation at call insns, and this fixed not
only a few guality regressions due to changes in register assignment,
but a number of other long-standing guality failures. Yay! This could
be split out into a standalone patch.
That might be wise -- I think we're going to need at least one more
iteration on the removal of copyrename.
In this version of the patch, we no longer touch the base vars at all.
We just associate the piece of RTL generated for the partition with a
list of decls, if needed. (I've just realized that I never noticed a
list of decls show up anywhere, and looking into this, I saw a bug in
the leader_merge function, that causes it to fail to go from a single
entry to a list: it creates the list, but then returns the original
non-list entry; that's why I never saw them! I won't delay posting the
patch just because of this; I'm not even sure we want decl lists in REG
or MEM attrs begin with)
Well, Richi noted the compile-time cost and poor data structure choice.
I'd ask the question, what's the benefit in tracking these as a list?
If we want to track, then how often do we need to actually traverse the
list, how hard would it be to build a pathological case (from a compile
time standpoint). Presumably there's no way to sort the list to make
finding an entry cheap?
I have collected some statistics on the effects of the patch in
compiling stage3-gcc/, before and after the patch, with and without
-fno-tree-coalesce-vars. I counted, per function:
b/a: before the patch, or after the patch
c/n: -ftree-coalesce-vars (default when optimizing) or
-fno-tree-coalesce-vars
cv: the coalescible var count, i.e., the active partition count prior to
coalescing. SSA_NAMEs not elligible for coalescing are not counted.
The more of these there are, the larger the conflict graph we have to
build.
base: the base variable count that guides the construction of the
conflict map. The more of these there are, the smaller the conflict
graph we have to build, but it is also a lower bound for the final
partition count.
part: the partition count after coalescing, not counting those of
SSA_NAMEs that were not elligible for coalescing to begin with.
abn: successful abnormal coalesce count. How many times
attempt_coalesce returned true as called in the abnormal coalesce loop.
same: successful normal coalesces of pairs of SSA_NAMEs that share the
same base variable (SSA_NAME_VAR, not the base index used to guide the
construction of the conflict graph). Ignored base decls are regarded as
NULL for purposes of this comparison. How many times attempt_coalesce
returned true for variables that share the same base variable. This may
count cases in which both vars are in the same partition already due to
earlier coalesces.
other: successful normal coalesces of pairs of SSA_NAMEs that do NOT
share the same base variable. Same caveats as above.
fail: failed attempts at normal coalece. How many times
attempt_coalesce returned false.
b/a c/n cv base part abn same other fail
before -fno-tr 570180 176682 221442 82076 370746 0 10542
before -ftree- 577212 171581 221927 82076 378093 0 18654
after -fno-tr 608533 179959 220948 82076 488119 0 11697
after -ftree- 589243 202588 221817 82076 349373 41775 24124
I've spent quite a bit of time trying to figure out what all this means.
I think the takeaway is we'll use a bit more memory, but we also
coalesce a bit better. Neither effect appears to be very large.
And here's the actual patch I'm submitting for your appreciation (I was
gonna say for inclusion, but given the leader_merge brown paper bag bug,
I'll just want feedback on whether we want that or not, and either drop
the list-building, or probably post a revised patch that fixes fallout
from lists where decls are expected.)
No regressions, and many progressions, on x86_64-linux-gnu and
i686-pc-linux-gnu.
[PR64164] Drop copyrename, use coalescible partition as base when optimizing.
for gcc/ChangeLog
PR rtl-optimization/64164
* Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
* tree-ssa-copyrename.c: Removed.
* opts.c (default_options_table): Drop -ftree-copyrename. Add
-ftree-coalesce-vars.
* passes.def: Drop all occurrences of pass_rename_ssa_copies.
* common.opt (ftree-copyrename): Ignore.
(ftree-coalesce-inlined-vars): Likewise.
* doc/invoke.texi: Remove the ignored options above.
* gimple-expr.h (gimple_can_coalesce_p): Note def location.
* gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
across variables when flag_tree_coalesce_vars. Check register
use and promoted modes to allow coalescing. Moved to
tree-ssa-coalesce.c.
* tree-ssa-live.c (struct tree_int_map_hasher): Move along
with its member functions to tree-ssa-coalesce.c.
(var_map_base_init): Likewise. Renamed to
compute_samebase_partition_bases.
(partition_view_normal): Drop want_bases parameter.
(partition_view_bitmap): Likewise.
* tree-ssa-live.h: Adjust declarations.
* tree-ssa-coalesce.c: Include explow.h.
(build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's
default defs at the entry point.
(dump_part_var_map): New.
(compute_optimized_partition_bases): New, called by...
(coalesce_ssa_name): ... when flag_tree_coalesce_vars, instead
of compute_samebase_partition_bases. Adjust.
* alias.c (nonoverlapping_memrefs_p): Special-case RTL-less
gimple-reg exprs.
* cfgexpand.c (leader_merge): New.
(get_rtl_for_parm_ssa_default_def): New.
(set_rtl): Merge exprs and attrs, even for MEMs and non-SSA
vars. Update DECL_RTL for PARM_DECLs and RESULT_DECLs too.
(expand_one_stack_var_at): Handle anonymous SSA_NAMEs. Drop
redundant MEM attr setting.
(expand_one_stack_var_1): Handle anonymous SSA_NAMEs. Renamed
from...
(expand_one_stack_var): ... this. New wrapper to check and
skip already expanded SSA partitions.
(record_alignment_for_reg_var): New, factored out of...
(expand_one_var): ... this.
(expand_one_ssa_partition): New.
(adjust_one_expanded_partition_var): New.
(expand_one_register_var): Check and skip already expanded SSA
partitions.
(expand_used_vars): Don't create DECLs for anonymous SSA
names. Expand all SSA partitions, then adjust all SSA names.
(pass::execute): Replace the loops that set
SA.partition_to_pseudo from partition leaders and cleared
DECL_RTL for multi-location variables, and that which used to
rename vars and set attrs, with one that clears DECL_RTL and
checks that PARMs and RESULTs default_defs match DECL_RTL.
* cfgexpand.h (get_rtl_for_parm_ssa_default_def): Declare.
* emit-rtl.c (set_reg_attrs_for_parm): Handle NULL and
TREE_LIST decl.
* explow.c (promote_ssa_mode): New.
* explow.h (promote_ssa_mode): Declare.
* expr.c (expand_expr_real_1): Handle anonymous SSA_NAMEs.
* function.c: Include cfgexpand.h.
(use_register_for_decl): Handle SSA_NAMEs, anonymous or not.
(use_register_for_parm_decl): Wrapper for the above to
special-case the result_ptr.
(rtl_for_parm): Ditto for get_rtl_for_parm_ssa_default_def.
(maybe_reset_rtl_for_parm): Reset DECL_RTL of parms with
multiple locations.
(assign_parm_adjust_stack_rtl): Add all and parm arguments,
for rtl_for_parm. For SSA-assigned parms, zero stack_parm.
(assign_parm_setup_block): Prefer SSA-assigned location.
(assign_parm_setup_reg): Likewise. Use entry_parm for equiv
if stack_parm is NULL.
(assign_parm_setup_stack): Prefer SSA-assigned location.
(assign_parms): Maybe reset DECL_RTL of params. Adjust stack
rtl before testing for pointer bounds. Special-case result_ptr.
(expand_function_start): Maybe reset DECL_RTL of result.
Prefer SSA-assigned location for result and static chain.
Factor out DECL_RESULT and SET_DECL_RTL.
* tree-outof-ssa.c (insert_value_copy_on_edge): Handle
anonymous SSA names. Use promote_ssa_mode.
(get_temp_reg): Likewise.
(remove_ssa_form): Adjust.
* var-tracking.c (dataflow_set_clear_at_call): Take call_insn
and get its reg_usage for reg invalidation.
(compute_bb_dataflow): Pass it insn.
(emit_notes_in_bb): Likewise.
for gcc/testsuite/ChangeLog
* gcc.dg/guality/pr54200.c: Add -fno-tree-coalesce-vars.
* gcc.dg/ssp-1.c: Make counter a register.
* gcc.dg/ssp-2.c: Likewise.
* gcc.dg/torture/parm-coalesce.c: New.
Just a few comments in addition to Richi's....
---
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index ca491a0..74190a6d 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -179,21 +179,137 @@ gimple_assign_rhs_to_tree (gimple stmt)
#define SSAVAR(x) (TREE_CODE (x) == SSA_NAME ? SSA_NAME_VAR (x) : x)
+/* NEXT is a DECL to be associated with some RTL, CUR is a DECL or a
+ TREE_LIST of DECLs. If NEXT is covered by CUR, return CUR
+ unchanged. Otherwise, return a list with all entries of CUR, with
+ NEXT at the end. If CUR was a list, it will be modified in
+ place. */
+
+static tree
+leader_merge (tree cur, tree next)
+{
+ if (cur == NULL || cur == next)
+ return next;
+
+ tree list;
+
+ if (TREE_CODE (cur) == TREE_LIST)
+ {
+ /* Look for NEXT in the list. Stop at the last node to insert
+ there. */
+ for (list = cur; ; list = TREE_CHAIN (list))
+ {
+ if (TREE_VALUE (list) == next)
+ return cur;
+ if (!TREE_CHAIN (list))
+ break;
+ }
+ }
+ else
+ /* Create the first node. */
+ list = build_tree_list (NULL, cur);
+
+ next = build_tree_list (NULL, next);
+ TREE_CHAIN (list) = next;
+
+ return cur;
+}
As Richi notes, avoid TREE_LIST :-) I suspect a vec would be an
improvement here. How often do we have more than one entry? How often
do we have to search this list and how hard is it to trigger
pathological behaviour here? If we're not gaining much, consider
dropping this completely. It's the most controversial part of the patch.
+
+ /* If this decl was marked as living in multiple places, reset
+ this now to NULL. */
+ tree var = SSA_NAME_VAR (name);
+ if (var && DECL_RTL_IF_SET (var) == pc_rtx)
Do we document the special meaning of pc_rtx in DECL_RTL?
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index b8dc7d5..ef31ba0f 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1229,6 +1229,11 @@ set_reg_attrs_for_parm (rtx parm_rtx, rtx mem)
void
set_reg_attrs_for_decl_rtl (tree t, rtx x)
{
+ if (!t)
+ return;
+ tree tdecl = t;
+ if (TREE_CODE (t) == TREE_LIST)
+ tdecl = TREE_VALUE (t);
As Richi mentioned, we only use the first entry, which begs the
question, do we need the the leader_merge bits at all.
Jeff