On Tue, 26 Sep 2017, Sebastian Pop wrote: > On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguent...@suse.de> wrote: > > > On Fri, 22 Sep 2017, Sebastian Pop wrote: > > > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguent...@suse.de> > > wrote: > > > > > > > > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal > > > > TLC. It also adds a testcase I reduced from a stupid mistake I made > > > > when reworking canonicalize_loop_closed_ssa. > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with > > > > -Ofast -march=haswell -floop-nest-optimize are > > > > > > > > 61 loop nests "optimized" > > > > 45 loop nest transforms cancelled because of code generation issues > > > > 21 loop nest optimizations timed out the 350000 ISL "operations" we > > allow > > > > > > > > I say "optimized" because the usual transform I've seen is static > > tiling > > > > as enforced by GRAPHITE according to --param loop-block-tile-size. > > > > There's no way to automagically figure what kind of transform ISL did > > > > > > > > > > Here is how to automate (without magic) the detection > > > of the transform that isl did. > > > > > > The problem solved by isl is the minimization of strides > > > in memory, and to do this, we need to tell the isl scheduler > > > the validity dependence graph, in graphite-optimize-isl.c > > > see the validity (RAW, WAR, WAW) and the proximity > > > (RAR + validity) maps. The proximity does include the > > > read after read, as the isl scheduler needs to minimize > > > strides between consecutive reads.
Ah, so I now see why we do not perform interchange on trivial cases like double A[1024][1024], B[1024][1024]; void foo(void) { for (int i = 0; i < 1024; ++i) for (int j = 0; j < 1024; ++j) A[j][i] = B[j][i]; } which is probably because /* FIXME: proximity should not be validity. */ isl_union_map *proximity = isl_union_map_copy (validity); falls apart when there is _no_ dependence? I can trick GRAPHITE into performing the interchange for double A[1024][1024], B[1024][1024]; void foo(void) { for (int i = 1; i < 1023; ++i) for (int j = 0; j < 1024; ++j) A[j][i] = B[j][i-1] + A[j][i+1]; } because now there is a dependence. Any idea on how to rewrite scop_get_dependences to avoid "simplifying"? I suppose the validity constraints _do_ also specify kind-of a proximity we just may not prune / optimize them in the same way as dependences? Richard.