On Tue, 26 Sep 2017, Sebastian Pop wrote:

> On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguent...@suse.de> wrote:
> 
> > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> >
> > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguent...@suse.de>
> > wrote:
> > >
> > > >
> > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > TLC.  It also adds a testcase I reduced from a stupid mistake I made
> > > > when reworking canonicalize_loop_closed_ssa.
> > > >
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > >
> > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > -Ofast -march=haswell -floop-nest-optimize are
> > > >
> > > >  61 loop nests "optimized"
> > > >  45 loop nest transforms cancelled because of code generation issues
> > > >  21 loop nest optimizations timed out the 350000 ISL "operations" we
> > allow
> > > >
> > > > I say "optimized" because the usual transform I've seen is static
> > tiling
> > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > There's no way to automagically figure what kind of transform ISL did
> > > >
> > >
> > > Here is how to automate (without magic) the detection
> > > of the transform that isl did.
> > >
> > > The problem solved by isl is the minimization of strides
> > > in memory, and to do this, we need to tell the isl scheduler
> > > the validity dependence graph, in graphite-optimize-isl.c
> > > see the validity (RAW, WAR, WAW) and the proximity
> > > (RAR + validity) maps.  The proximity does include the
> > > read after read, as the isl scheduler needs to minimize
> > > strides between consecutive reads.

Ah, so I now see why we do not perform interchange on trivial cases like

double A[1024][1024], B[1024][1024];

void foo(void)
{
  for (int i = 0; i < 1024; ++i)
    for (int j = 0; j < 1024; ++j)
      A[j][i] = B[j][i];
}

which is probably because

  /* FIXME: proximity should not be validity.  */
  isl_union_map *proximity = isl_union_map_copy (validity);

falls apart when there is _no_ dependence?

I can trick GRAPHITE into performing the interchange for

double A[1024][1024], B[1024][1024];

void foo(void)
{
  for (int i = 1; i < 1023; ++i)
    for (int j = 0; j < 1024; ++j)
      A[j][i] = B[j][i-1] + A[j][i+1];
}

because now there is a dependence.  Any idea on how to rewrite
scop_get_dependences to avoid "simplifying"?  I suppose the
validity constraints _do_ also specify kind-of a proximity
we just may not prune / optimize them in the same way as
dependences?

Richard.

Reply via email to