On 06/26/14 02:58, Richard Biener wrote:
On Thu, 26 Jun 2014, Richard Biener wrote:
On Thu, 26 Jun 2014, Richard Biener wrote:
On Wed, 25 Jun 2014, Jeff Law wrote:
On 06/25/14 08:05, Richard Biener wrote:
This removes restrictions in DOM cprop_operand that inhibit
some optimizations. The volatile pointer thing is really realy
old and no longer necessary while the loop-depth consideration
is only valid for loop-closed PHI nodes (but we're not in
loop-closed SSA in DOM) - the coalescing is handled in out-of-SSA
phase by inserting copies appropriately.
Bootstrapped on x86_64-unknown-linux-gnu, ok?
Thanks,
Richard.
2014-06-25 Richard Biener <rguent...@suse.de>
PR tree-optimization/61607
* tree-ssa-dom.c (cprop_operand): Remove restriction on
propagating volatile pointers and on loop depth.
The first hunk is OK.
I thought we had tests for the do not copy propagate out of a loop nest in the
suite. Did you check that tests in BZ 19038 still generate good code after
this change? If we still generate good code for those tests, then this hunk
is fine too.
I have applied the first hunk and will investigate further. Testing
didn't show any issue and I know how to retain the check but not
cause the missed optimization shown in PR61607.
Let's try to summarize what the restriction is supposed to avoid.
It tries to avoid introducing uses of SSA names defined inside a
loop outside of it because if the SSA name is live over the backedge
we will then have an overlapping life-range which prevents out-of-SSA
from coalescing it to a single register.
Now, the existing test is not working in that way.
Rather the best way we have to ensure this property (all outside
uses go through a copy that is placed on exit edges rather than
possibly on the backedge) is to go into loop-closed SSA form.
This is also where the PHI nodes that confuse DOM in PR61607
come from in the first place.
Now as the existing measure is ineffective in some cases out-of-SSA
has gotten the ability to deal with this (or a subset):
/* If elimination of a PHI requires inserting a copy on a backedge,
then we will have to split the backedge which has numerous
undesirable performance effects.
A significant number of such cases can be handled here by inserting
copies into the loop itself. */
insert_backedge_copies ();
now, this doesn't seem to deal with outside uses. But eventually
the coalescing code already assigns proper cost to backedge copies
so that we choose to place copies on the exit edges rather than
the backedge ones - seems not so from looking at coalesce_cost_edge.
So I think that we should remove the copy-propagation restrictions
and instead address this in out-of-SSA.
For now the following patch retains the exact same restriction in
DOM as it is present in copyprop (but not in FRE - ok my recent fault,
or in VRP). By avoiding to record the equivalency for PHIs
(where we know that either all or no uses should be covered by
the loop depth check) we retain the ability to record the equivalency
for the two loop exit PHI nodes and thus the threading (if only
on the false path).
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
I'll try to see what happens to the PR19038 testcases (though
that PR is a mess ...)
I checked the very original one (thin6d.f from sixtrack) and the
generated assembly for -Ofast is the same without any patch
and with _all_ loop_depth_of_name restrictions removed from
both DOM and copyprop (thus making loop_depth_of_name dead).
The cost of out-of-SSA copies for backedges (or in the case
of the PR, loop latch edges causing an edge split) is dealt
with by
/* Inserting copy on critical edge costs more than inserting it
elsewhere. */
if (EDGE_CRITICAL_P (e))
mult = 2;
in coalesce_cost_edge.
So in the end, without a testcase to investigate, I'd propose
to get rid of those restrictions. I'm still going forward
with the patch below for now.
Sounds good. Glad to see those hacks disappear.
Jeff