Re: A + B CMP A -> A CMP' CST' match.pd patterns [was [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)]

Jeff Law Wed, 25 Jan 2017 09:32:51 -0800

On 01/25/2017 03:34 AM, Richard Biener wrote:

On Tue, Jan 24, 2017 at 4:05 PM, Jeff Law <l...@redhat.com> wrote:

On 01/24/2017 07:29 AM, Marc Glisse wrote:


On Tue, 24 Jan 2017, Richard Biener wrote:

That was my thought as well, but AFAICT we only call into match.pd
from VRP if we changed the insn.



Yes - there was thoughts to change that (but it comes at an expense).
Basically we'd like to re-fold stmts that indirectly use stmts we
changed.  We certainly don't want to re-fold everything all the time.



VRP is kind of a special case, every variable for which it finds a
new/improved range could be considered changed, since it may trigger
some extra transformation in match.pd (same for CCP and the nonzero
mask).


But that would assume that match.pd is relying on range information and
could thus use the improved range information.  *If* match.pd is using the
range information generated by VRP, it's not terribly pervasive.

But waiting until forwprop3 means we're leaving a ton of useless blocks and
statements in the IL for this testcase, and likely other code using
std::vec.

Perhaps rather than open-coding a fix in VRP I could have VRP call into
match.pd slightly more aggressively (say just for gimple_cond).  That may be
enough to capture the effects much earlier in the pipeline without trying to
fold *everything*.


Sure, the only disadvantage of doing it in VRP (in vrp_fold_stmt) is that you
may end up doing it twice.

Once per VRP pass doesn't seem excessive.

If we simplify in VRP with a valueizer that walks up the ASSERT_EXPRs,then VRP1 will simplify the two key conditionals. The first DOM pass isthen able to clean up the whole mess. But that valueizer runs afoul ofmaybe_set_nonzero_bits's assumptions for an unrelated testcase (pr60482).

maybe_set_nonzero_bits has restrictions on the number of uses of anSSA_NAME. folding with a valueizer that walks the ASSERT_EXPR chain hasa side effect of copy propagating through ASSERT_EXPRs. So for thepr60482 testcase we end up with 3 uses of "n_12" rather than theexpected 2. That in turn causes us to avoid aggressively clearing bitsin the non-zero bitmask of n_12. That in turn causes us to fail toeliminate a conditional, which in turn causes us to need a loop epiloguefor vectorization. Ugh.

If we fold in VRP1 without walking up the ASSERT_EXPRs, we transformjust the first conditional in VRP1. A goodly amount of simplificationis still done in the first DOM pass, but not all of it.

forwprop3 then transforms the second conditional which PRE is then ableto optimize away. That's early enough to allow sinking of the arithmetic.

The first DOM pass still cleaned up most of the crud early so we'reavoiding useless work. The final result is the same as with theASSERT_EXPR walking valueizer. That seems like a reasonable compromise.


Spinning that version...

jeff

Re: A + B CMP A -> A CMP' CST' match.pd patterns [was [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)]

Reply via email to