On Fri, 15 Mar 2013, Richard Biener wrote:

> On Thu, 14 Mar 2013, Richard Biener wrote:
> 
> > 
> > This extracts pieces from the already posted patch series that are
> > most worthwhile and applicable for backporting to both 4.8 and 4.7.
> > It also re-implements the limiting of the maximum number of memory
> > references to consider for LIMs dependence analysis.  This limiting
> > is now done per loop-nest and disables optimizing outer loops
> > only.  The limiting requires backporting introduction of the
> > shared unalalyzable mem-ref - it works by marking that as stored
> > in loops we do not want to compute dependences for - which makes
> > dependence computation for mems in those loops linear, as that
> > mem-ref, which conveniently has ID 0, is tested first.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > The current limit of 1000 datarefs is quite low (well, for LIMs
> > purposes, that is), and I only bothered to care about -O1 for
> > backports (no caching of the affine combination).  With the
> > limit in place and at -O1 LIM now takes
> > 
> >  tree loop invariant motion:   0.55 ( 1%) usr
> > 
> > for the testcase in PR39326.  Four patches in total, we might
> > consider not backporting the limiting, without it this
> > insane testcase has, at ~2GB memory usage (peak determined by IRA)
> > 
> >  tree loop invariant motion: 533.30 (77%) usr
> > 
> > but avoids running into the DSE / combine issue (and thus stays
> > managable overall at -O1).  With limiting it requires -fno-dse
> > to not blow up (>5GB of memory use).
> 
> Note that the limiting patch (below) causes code-generation differences
> because it collects memory-references in a different order and
> store-motion applies its transform in order of mem-ref IDs
> (different order of loads / stores and different decl UIDs).  The
> different ordering results in quite a big speedup because bitmaps
> have a more regular form (maybe only for this testcase though).

I have now committed the first two patches to trunk as r196768.

Richard.

2013-03-18  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/39326
        * tree-ssa-loop-im.c (refs_independent_p): Exploit symmetry.
        (struct mem_ref): Replace mem member with ao_ref typed member.
        (MEM_ANALYZABLE): Adjust.
        (memref_eq): Likewise.
        (mem_ref_alloc): Likewise.
        (gather_mem_refs_stmt): Likewise.
        (mem_refs_may_alias_p): Use the ao_ref to query the alias oracle.
        (execute_sm_if_changed_flag_set): Adjust.
        (execute_sm): Likewise.
        (ref_always_accessed_p): Likewise.
        (refs_independent_p): Likewise.
        (can_sm_ref_p): Likewise.

Reply via email to