On Fri, 15 Mar 2013, Richard Biener wrote: > On Thu, 14 Mar 2013, Richard Biener wrote: > > > > > This extracts pieces from the already posted patch series that are > > most worthwhile and applicable for backporting to both 4.8 and 4.7. > > It also re-implements the limiting of the maximum number of memory > > references to consider for LIMs dependence analysis. This limiting > > is now done per loop-nest and disables optimizing outer loops > > only. The limiting requires backporting introduction of the > > shared unalalyzable mem-ref - it works by marking that as stored > > in loops we do not want to compute dependences for - which makes > > dependence computation for mems in those loops linear, as that > > mem-ref, which conveniently has ID 0, is tested first. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > The current limit of 1000 datarefs is quite low (well, for LIMs > > purposes, that is), and I only bothered to care about -O1 for > > backports (no caching of the affine combination). With the > > limit in place and at -O1 LIM now takes > > > > tree loop invariant motion: 0.55 ( 1%) usr > > > > for the testcase in PR39326. Four patches in total, we might > > consider not backporting the limiting, without it this > > insane testcase has, at ~2GB memory usage (peak determined by IRA) > > > > tree loop invariant motion: 533.30 (77%) usr > > > > but avoids running into the DSE / combine issue (and thus stays > > managable overall at -O1). With limiting it requires -fno-dse > > to not blow up (>5GB of memory use). > > Note that the limiting patch (below) causes code-generation differences > because it collects memory-references in a different order and > store-motion applies its transform in order of mem-ref IDs > (different order of loads / stores and different decl UIDs). The > different ordering results in quite a big speedup because bitmaps > have a more regular form (maybe only for this testcase though).
I have now committed the first two patches to trunk as r196768. Richard. 2013-03-18 Richard Biener <rguent...@suse.de> PR tree-optimization/39326 * tree-ssa-loop-im.c (refs_independent_p): Exploit symmetry. (struct mem_ref): Replace mem member with ao_ref typed member. (MEM_ANALYZABLE): Adjust. (memref_eq): Likewise. (mem_ref_alloc): Likewise. (gather_mem_refs_stmt): Likewise. (mem_refs_may_alias_p): Use the ao_ref to query the alias oracle. (execute_sm_if_changed_flag_set): Adjust. (execute_sm): Likewise. (ref_always_accessed_p): Likewise. (refs_independent_p): Likewise. (can_sm_ref_p): Likewise.