Re: [PATCH][GRAPHITE] More TLC

Sebastian Pop Thu, 28 Sep 2017 11:47:36 -0700

On Wed, Sep 27, 2017 at 7:18 AM, Richard Biener <rguent...@suse.de> wrote:


> On Tue, 26 Sep 2017, Sebastian Pop wrote:
>
> > On Mon, Sep 25, 2017 at 8:12 AM, Richard Biener <rguent...@suse.de>
> wrote:
> >
> > > On Fri, 22 Sep 2017, Sebastian Pop wrote:
> > >
> > > > On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener <rguent...@suse.de>
> > > wrote:
> > > >
> > > > >
> > > > > This simplifies canonicalize_loop_closed_ssa and does other minimal
> > > > > TLC.  It also adds a testcase I reduced from a stupid mistake I
> made
> > > > > when reworking canonicalize_loop_closed_ssa.
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to
> trunk.
> > > > >
> > > > > SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> > > > > -Ofast -march=haswell -floop-nest-optimize are
> > > > >
> > > > >  61 loop nests "optimized"
> > > > >  45 loop nest transforms cancelled because of code generation
> issues
> > > > >  21 loop nest optimizations timed out the 350000 ISL "operations"
> we
> > > allow
> > > > >
> > > > > I say "optimized" because the usual transform I've seen is static
> > > tiling
> > > > > as enforced by GRAPHITE according to --param loop-block-tile-size.
> > > > > There's no way to automagically figure what kind of transform ISL
> did
> > > > >
> > > >
> > > > Here is how to automate (without magic) the detection
> > > > of the transform that isl did.
> > > >
> > > > The problem solved by isl is the minimization of strides
> > > > in memory, and to do this, we need to tell the isl scheduler
> > > > the validity dependence graph, in graphite-optimize-isl.c
> > > > see the validity (RAW, WAR, WAW) and the proximity
> > > > (RAR + validity) maps.  The proximity does include the
> > > > read after read, as the isl scheduler needs to minimize
> > > > strides between consecutive reads.
>
> Ah, so I now see why we do not perform interchange on trivial cases like
>
> double A[1024][1024], B[1024][1024];
>
> void foo(void)
> {
>   for (int i = 0; i < 1024; ++i)
>     for (int j = 0; j < 1024; ++j)
>       A[j][i] = B[j][i];
> }
>
> which is probably because
>
>   /* FIXME: proximity should not be validity.  */
>   isl_union_map *proximity = isl_union_map_copy (validity);
>
> falls apart when there is _no_ dependence?
>

You are right.  The proximity needs to account for spatial
locality as well if you want to interchange the loop.
To describe the spatial locality, I would recommend adding
to the proximity relation the array accesses from two
successive iterations of the innermost loop:
A[j][i] -> A[j][i+1] and B[j][i] -> B[j][i+1]
With these two extra relations in the proximity map,
isl should be able to interchange the above loop.


>
> I can trick GRAPHITE into performing the interchange for
>
> double A[1024][1024], B[1024][1024];
>
> void foo(void)
> {
>   for (int i = 1; i < 1023; ++i)
>     for (int j = 0; j < 1024; ++j)
>       A[j][i] = B[j][i-1] + A[j][i+1];
> }
>
> because now there is a dependence.  Any idea on how to rewrite
> scop_get_dependences to avoid "simplifying"?  I suppose the
> validity constraints _do_ also specify kind-of a proximity
>

Correct: the validity map specifies a subset (it is missing
RAR dependences) of data reuse.


> we just may not prune / optimize them in the same way as
> dependences?
>

Validity constraints are there to "keep the wind blowing
in the same direction" after the transform (otherwise the
result of the transformed computation may be wrong.)

The proximity map should contain a description of
- reuse of memory (temporal locality)
- how close together the access elements are (spatial locality.)
isl will optimize for both if the proximity map has a description
of both.

For the moment the proximity map is initialized only with the
current validity constraints, as you quoted the FIXME comment,
which would only describe a subset of the temporal locality.

Sebastian

Re: [PATCH][GRAPHITE] More TLC

Reply via email to