On 01/25/2017 03:34 AM, Richard Biener wrote:
On Tue, Jan 24, 2017 at 4:05 PM, Jeff Law <l...@redhat.com> wrote:
On 01/24/2017 07:29 AM, Marc Glisse wrote:

On Tue, 24 Jan 2017, Richard Biener wrote:

That was my thought as well, but AFAICT we only call into match.pd
from VRP if we changed the insn.


Yes - there was thoughts to change that (but it comes at an expense).
Basically we'd like to re-fold stmts that indirectly use stmts we
changed.  We certainly don't want to re-fold everything all the time.


VRP is kind of a special case, every variable for which it finds a
new/improved range could be considered changed, since it may trigger
some extra transformation in match.pd (same for CCP and the nonzero
mask).

But that would assume that match.pd is relying on range information and
could thus use the improved range information.  *If* match.pd is using the
range information generated by VRP, it's not terribly pervasive.

But waiting until forwprop3 means we're leaving a ton of useless blocks and
statements in the IL for this testcase, and likely other code using
std::vec.

Perhaps rather than open-coding a fix in VRP I could have VRP call into
match.pd slightly more aggressively (say just for gimple_cond).  That may be
enough to capture the effects much earlier in the pipeline without trying to
fold *everything*.

Sure, the only disadvantage of doing it in VRP (in vrp_fold_stmt) is that you
may end up doing it twice.
Once per VRP pass doesn't seem excessive.

If we simplify in VRP with a valueizer that walks up the ASSERT_EXPRs, then VRP1 will simplify the two key conditionals. The first DOM pass is then able to clean up the whole mess. But that valueizer runs afoul of maybe_set_nonzero_bits's assumptions for an unrelated testcase (pr60482).

maybe_set_nonzero_bits has restrictions on the number of uses of an SSA_NAME. folding with a valueizer that walks the ASSERT_EXPR chain has a side effect of copy propagating through ASSERT_EXPRs. So for the pr60482 testcase we end up with 3 uses of "n_12" rather than the expected 2. That in turn causes us to avoid aggressively clearing bits in the non-zero bitmask of n_12. That in turn causes us to fail to eliminate a conditional, which in turn causes us to need a loop epilogue for vectorization. Ugh.

If we fold in VRP1 without walking up the ASSERT_EXPRs, we transform just the first conditional in VRP1. A goodly amount of simplification is still done in the first DOM pass, but not all of it.

forwprop3 then transforms the second conditional which PRE is then able to optimize away. That's early enough to allow sinking of the arithmetic.

The first DOM pass still cleaned up most of the crud early so we're avoiding useless work. The final result is the same as with the ASSERT_EXPR walking valueizer. That seems like a reasonable compromise.

Spinning that version...

jeff




Reply via email to